The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Multi-LLM consensus can improve annotation accuracy by combining the strengths of diverse AI models while reducing the impact of individual model limitations (see Yang et al., 2025).
Traditional single-model annotation systems face inherent limitations:
mLLMCelltype’s consensus framework is analogous to the peer review process in scientific publishing.
Just as scientific papers benefit from multiple expert reviewers, cell annotations can benefit from multiple AI models:
| Scientific Peer Review | mLLMCelltype Consensus |
|---|---|
| Multiple expert reviewers | Multiple LLM models |
| Diverse perspectives | Different training approaches |
| Debate and discussion | Structured deliberation |
| Consensus building | Agreement quantification |
| Quality assurance | Uncertainty metrics |
1. Error Detection Through Cross-Validation - Models check each other’s work - Individual model biases can be averaged out - Outlier predictions are identified
2. Transparent Uncertainty Quantification - Consensus Proportion (CP): Measures inter-model agreement - Shannon Entropy: Quantifies prediction uncertainty - Controversy Detection: Automatically identifies clusters requiring expert review
Cell type annotation involves:
For benchmark results, see Yang et al. (2025):
Yang, C., Zhang, X., & Chen, J. (2025). Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data. bioRxiv. https://doi.org/10.1101/2025.04.10.647852
The two-stage approach can reduce API calls when models agree early:
This means the cost overhead of using multiple models is partially offset by skipping deliberation for clear cases.
Stage 1: Independent Analysis Each LLM analyzes marker genes and provides: - Cell type predictions - Confidence scores - Reasoning chains
Stage 2: Consensus Building The system: - Compares predictions across models - Identifies areas of agreement and disagreement - Calculates uncertainty metrics
Stage 3: Deliberation (when needed) For controversial clusters: - Models share their reasoning - Structured debate occurs - Final consensus emerges
Consensus may be preferable when: - Uncertainty quantification is needed - Datasets involve novel or complex tissues - Results will be published or used in downstream analyses - Identifying low-confidence annotations is important
Consider alternatives when: - Quick exploratory analysis is the goal - Datasets are well-characterized with clear markers - API budget is very limited - Proof-of-concept work in early stages
library(mLLMCelltype)
# Load your single-cell data
results <- interactive_consensus_annotation(
seurat_obj = your_data,
tissue_name = "PBMC",
models = c("gpt-4o", "claude-sonnet-4-5-20250929", "gemini-2.5-pro"),
consensus_method = "iterative"
)The consensus approach provides a framework for combining multiple LLM predictions with built-in uncertainty quantification. As new models become available, the framework can incorporate them without changes to the overall methodology.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.