πŸ”§

scRNA-seq Troubleshooting Guide 2025

Expert solutions for common single cell RNA sequencing analysis and cell type annotation problems

Comprehensive troubleshooting guide to help you solve the most common issues in single-cell analysis workflows.

🚨 Quick Problem Identifier

Select your issue type to jump to the relevant solution:

πŸ“Š Data Quality Issues

Low cell counts, high mitochondrial genes, doublets

βš™οΈ Preprocessing Problems

Normalization, scaling, feature selection issues

πŸ”— Clustering Issues

Poor clusters, over/under-clustering, resolution problems

🏷️ Annotation Problems

Incorrect cell types, low confidence, novel populations

πŸ”„ Batch Integration

Batch effects, integration failures, sample differences

⚑ Performance Issues

Slow processing, memory errors, crashes

πŸ“Š Data Quality Troubleshooting

Problem: Low Cell/Gene Counts

Symptoms:

  • Many cells have < 1000 detected genes
  • High percentage of cells removed during filtering
  • Poor clustering results

Solutions:

  1. Adjust Quality Thresholds:
    # R/Seurat example
    # Lower thresholds for difficult samples
    subset(seurat_obj, subset = nFeature_RNA > 500 & nCount_RNA > 1000)
    
    # Python/Scanpy example  
    sc.pp.filter_cells(adata, min_genes=500)
    sc.pp.filter_genes(adata, min_cells=10)
  2. Check Sample Preparation: Review tissue dissociation, cell capture efficiency
  3. Consider Cell Type: Some cell types naturally have lower RNA content

Problem: High Mitochondrial Gene Expression

Symptoms:

  • Cells with >20% mitochondrial gene expression
  • Potential cell stress or death

Solutions:

  1. Progressive Filtering:
    # Start with relaxed thresholds
    subset(seurat_obj, subset = percent.mt < 25)
    # Then tighten based on results
    subset(seurat_obj, subset = percent.mt < 15)
  2. Tissue-Specific Thresholds: Brain cells: <20%, Blood cells: <10%
  3. Regression Approach: Regress out mitochondrial effects instead of filtering

βš™οΈ Preprocessing Troubleshooting

Problem: Poor Normalization Results

Symptoms:

  • Batch effects still visible after normalization
  • Highly variable genes dominated by ribosomal/mitochondrial genes
  • Poor clustering separation

Solutions:

  1. Try Different Normalization Methods:
    Method Best For When to Use
    Log-normalization Standard analysis Default choice
    SCTransform Heterogeneous datasets Strong batch effects
    scran Sparse data Many zero counts
  2. Parameter Optimization: Adjust scaling factors and regression variables
  3. Alternative Approaches: Consider SCTransform for complex datasets

Problem: Feature Selection Issues

Solutions:

  1. Increase HVG Count: Try 3000-5000 highly variable genes instead of 2000
  2. Use Multiple Methods: Combine vst, mean.var.plot, and dispersion approaches
  3. Manual Curation: Remove unwanted gene categories (ribosomal, mitochondrial)

πŸ”— Clustering Troubleshooting

Problem: Over-clustering (Too Many Small Clusters)

Solutions:

  1. Reduce Resolution:
    # Try lower resolution values
    FindClusters(seurat_obj, resolution = 0.3)  # Instead of 0.8
    FindClusters(seurat_obj, resolution = 0.5)  # Middle ground
  2. Increase k Parameter: Use more neighbors in SNN graph construction
  3. Merge Similar Clusters: Use hierarchical clustering to identify merge candidates

Problem: Under-clustering (Missing Cell Types)

Solutions:

  1. Increase Resolution: Try 0.8, 1.0, or higher
  2. Adjust PCA Dimensions: Use more PCs (30-50 instead of 20)
  3. Re-examine Preprocessing: Check if important genes were filtered out

🏷️ Cell Type Annotation Troubleshooting

Problem: Incorrect Cell Type Assignments

Symptoms:

  • Known markers not matching assigned cell types
  • Biologically implausible results
  • Low confidence scores

Solutions with mLLMCelltype:

  1. Enable Multi-Model Consensus:

    🎯 Pro Tip: Use 3-5 different AI models and set consensus threshold to 0.7 for higher confidence

  2. Use Discussion Mode: Let models debate uncertain annotations
  3. Provide Better Context: Include tissue type and experimental condition information
  4. Refine Marker Genes: Use more specific, high-quality marker genes

Problem: Novel or Rare Cell Types Not Recognized

Solutions:

  1. Lower Consensus Threshold: Allow more exploratory annotations
  2. Manual Review: Examine clusters with "Unknown" annotations
  3. Literature Search: Research potential novel populations in your tissue
  4. Functional Analysis: Perform pathway analysis to understand cell function

πŸ”„ Batch Integration Troubleshooting

Problem: Strong Batch Effects Persist

Integration Method Comparison:

Method Strength Best For
Harmony Fast, robust Large datasets
scanorama Panoramic integration Diverse samples
CCA/RPCA Seurat native Similar protocols
scVI Deep learning Complex batch effects

⚑ Performance Troubleshooting

Problem: Memory Errors and Crashes

Solutions:

  1. Reduce Data Size: Subsample cells or genes for initial analysis
  2. Use Disk-Based Storage: Enable on-disk storage for large objects
  3. Optimize Parameters: Reduce PCA dimensions, use fewer HVGs
  4. Cloud Computing: Use mLLMCelltype web platform for processing

πŸ›‘οΈ Prevention Best Practices

πŸ“‹ Quality Control Checklist

  • Always plot QC metrics before filtering
  • Use tissue-appropriate thresholds
  • Document all parameter choices
  • Save intermediate analysis steps

πŸ” Validation Steps

  • Cross-check with known markers
  • Validate with external datasets
  • Use multiple annotation methods
  • Manual review of uncertain clusters

πŸ“š Documentation

  • Record all software versions
  • Document parameter settings
  • Keep analysis notebooks organized
  • Note troubleshooting steps taken

Still Having Issues?

Try mLLMCelltype's AI-powered annotation for more accurate results

Try mLLMCelltype Now View FAQ