How to Annotate Cell Types in scRNA-seq: Complete Guide

Learn how to perform cell type annotation on your single-cell RNA sequencing data using the latest automated methods in 2024.

Prepare Your scRNA-seq Data

Before annotating cell types, you need to extract marker genes from your clustered scRNA-seq data.

Using Seurat (R):

# Find markers for all clusters
markers <- FindAllMarkers(seurat_object, 
                         only.pos = TRUE, 
                         min.pct = 0.25, 
                         logfc.threshold = 0.25)

# Export top markers
top_markers <- markers %>% 
  group_by(cluster) %>% 
  top_n(n = 50, wt = avg_log2FC)

write.csv(top_markers, "markers_for_annotation.csv")

Using Scanpy (Python):

# Find marker genes
sc.tl.rank_genes_groups(adata, 'clusters', method='wilcoxon')

# Export to DataFrame
markers_df = pd.DataFrame(adata.uns['rank_genes_groups']['names']).head(50)
markers_df.to_csv('markers_for_annotation.csv')

Choose Your Annotation Method

Traditional Methods:

Manual Annotation: Time-consuming but accurate for experts
Reference-Based: SingleR, Seurat mapping (requires good references)
Marker-Based: CellMarker, PanglaoDB (limited coverage)

AI-Powered Method (Recommended):

Multi-model consensus annotation using mLLMCelltype provides:

Higher accuracy through consensus
No reference requirements
Handles novel cell types
Automated workflow

Upload Your Marker Genes

Navigate to mLLMCelltype and upload your marker genes file.

💡 Pro Tip

Include 20-50 top marker genes per cluster for best results. Too few genes may reduce accuracy, while too many may introduce noise.

Supported Formats:

CSV with clusters as columns
TSV with marker gene lists
Excel files from common tools

Configure Annotation Parameters

Essential Settings:

Species: Human, Mouse, or custom
Tissue Type: PBMC, Brain, Liver, etc.
AI Models: Select multiple for consensus (GPT-4, Claude, Gemini)

Advanced Options:

Consensus Threshold: 0.6 (default) - increase for stricter consensus
Entropy Threshold: 1.2 (default) - lower for higher confidence
Discussion Rounds: 3 (default) - more rounds for difficult cases

Run Automated Annotation

Click "Start Analysis" to begin the multi-model annotation process.

What Happens Next:

Each AI model analyzes your marker genes independently
Models compare their annotations
If disagreement occurs, models discuss to reach consensus
Final annotations are determined with confidence scores

⏱️ Time Estimate

Most analyses complete in 5-15 minutes, depending on the number of clusters and selected models.

Interpret and Download Results

Understanding Your Results:

Cell Type: The consensus annotation for each cluster
Consensus Score: Agreement level between models (0-1)
Entropy: Uncertainty measure (lower is better)
Individual Predictions: Each model's annotation

Next Steps:

Download results as CSV
Import back to Seurat/Scanpy
Visualize on UMAP/tSNE
Validate key populations

Best Practices for Cell Type Annotation

Data Quality:

Ensure proper QC filtering before clustering
Remove doublets and low-quality cells
Use appropriate clustering resolution

Marker Gene Selection:

Use statistically significant markers (adjusted p-value < 0.05)
Include both positive and negative markers
Consider fold-change thresholds

Validation:

Check known marker expression
Compare with published datasets
Perform downstream functional analysis

Troubleshooting Common Issues

Low Confidence Annotations:

If you get low confidence scores:

Check if clusters are well-separated
Ensure sufficient marker genes
Consider re-clustering at different resolutions

Unexpected Cell Types:

If annotations seem incorrect:

Verify tissue type is correctly specified
Check for batch effects
Review marker gene quality

Ready to Annotate Your Cell Types?

Start using automated multi-model annotation for your scRNA-seq data

Start Free Annotation