How to Annotate Cell Types in scRNA-seq: Complete Guide
Learn how to perform cell type annotation on your single-cell RNA sequencing data using the latest automated methods in 2024.
1
Prepare Your scRNA-seq Data
Before annotating cell types, you need to extract marker genes from your clustered scRNA-seq data.
Using Seurat (R):
# Find markers for all clusters
markers <- FindAllMarkers(seurat_object,
only.pos = TRUE,
min.pct = 0.25,
logfc.threshold = 0.25)
# Export top markers
top_markers <- markers %>%
group_by(cluster) %>%
top_n(n = 50, wt = avg_log2FC)
write.csv(top_markers, "markers_for_annotation.csv")
Using Scanpy (Python):
# Find marker genes
sc.tl.rank_genes_groups(adata, 'clusters', method='wilcoxon')
# Export to DataFrame
markers_df = pd.DataFrame(adata.uns['rank_genes_groups']['names']).head(50)
markers_df.to_csv('markers_for_annotation.csv')
2
Choose Your Annotation Method
Traditional Methods:
- Manual Annotation: Time-consuming but accurate for experts
- Reference-Based: SingleR, Seurat mapping (requires good references)
- Marker-Based: CellMarker, PanglaoDB (limited coverage)
AI-Powered Method (Recommended):
Multi-model consensus annotation using mLLMCelltype provides:
- Higher accuracy through consensus
- No reference requirements
- Handles novel cell types
- Automated workflow
3
Upload Your Marker Genes
Navigate to mLLMCelltype and upload your marker genes file.
💡 Pro Tip
Include 20-50 top marker genes per cluster for best results. Too few genes may reduce accuracy, while too many may introduce noise.
Supported Formats:
- CSV with clusters as columns
- TSV with marker gene lists
- Excel files from common tools
4
Configure Annotation Parameters
Essential Settings:
- Species: Human, Mouse, or custom
- Tissue Type: PBMC, Brain, Liver, etc.
- AI Models: Select multiple for consensus (GPT-4, Claude, Gemini)
Advanced Options:
- Consensus Threshold: 0.6 (default) - increase for stricter consensus
- Entropy Threshold: 1.2 (default) - lower for higher confidence
- Discussion Rounds: 3 (default) - more rounds for difficult cases
5
Run Automated Annotation
Click "Start Analysis" to begin the multi-model annotation process.
What Happens Next:
- Each AI model analyzes your marker genes independently
- Models compare their annotations
- If disagreement occurs, models discuss to reach consensus
- Final annotations are determined with confidence scores
⏱️ Time Estimate
Most analyses complete in 5-15 minutes, depending on the number of clusters and selected models.
6
Interpret and Download Results
Understanding Your Results:
- Cell Type: The consensus annotation for each cluster
- Consensus Score: Agreement level between models (0-1)
- Entropy: Uncertainty measure (lower is better)
- Individual Predictions: Each model's annotation
Next Steps:
- Download results as CSV
- Import back to Seurat/Scanpy
- Visualize on UMAP/tSNE
- Validate key populations
Best Practices for Cell Type Annotation
Data Quality:
- Ensure proper QC filtering before clustering
- Remove doublets and low-quality cells
- Use appropriate clustering resolution
Marker Gene Selection:
- Use statistically significant markers (adjusted p-value < 0.05)
- Include both positive and negative markers
- Consider fold-change thresholds
Validation:
- Check known marker expression
- Compare with published datasets
- Perform downstream functional analysis
Troubleshooting Common Issues
Low Confidence Annotations:
If you get low confidence scores:
- Check if clusters are well-separated
- Ensure sufficient marker genes
- Consider re-clustering at different resolutions
Unexpected Cell Types:
If annotations seem incorrect:
- Verify tissue type is correctly specified
- Check for batch effects
- Review marker gene quality
Ready to Annotate Your Cell Types?
Start using automated multi-model annotation for your scRNA-seq data
Start Free Annotation