PBMC Multi-omics Demo
Run all 15 skills on PBMC data — the gold-standard benchmark for single-cell and multi-omics analysis. Choose between our built-in synthetic demo data or real publicly available datasets.
Option 1 — Built-in Synthetic Demo (recommended to start)
No download needed. Generates realistic PBMC data in seconds based on published marker gene profiles.
python3 data/pbmc_demo_generator.py # Creates 8 synthetic PBMC layers instantly: scRNA-seq: 3,000 cells x 2,000 genes (10 cell types) Bulk RNA-seq: 12 donors — Healthy vs Sepsis ATAC-seq: 2,000 cells x 5,000 peaks Proteomics: 300 plasma proteins x 12 donors Metabolomics: 250 serum features x 12 donors CITE-seq ADT: 1,000 cells x 25 surface proteins Metagenomics: 20 gut microbiome samples Genomics: 500 variants + 8 pharmacogenomic loci python3 omics_agent.py --demo
Option 2 — Real Published PBMC Datasets
Download these free, publicly available datasets and run OmicsAgent on real data.
scRNA-seq
10x PBMC 3k FreePublished~50 MB
The most widely used single-cell benchmark dataset. 2,700 cells after QC, 8 cell types. Used in Seurat and Scanpy tutorials worldwide.
10x PBMC 68k FreePublished~1.2 GB
Large-scale PBMC dataset used in the original Cell Ranger paper. Ideal for testing scalability and rare cell type detection.
Human Cell Atlas — PBMC FreePublished~500 MB
Multi-donor PBMC atlas with healthy and disease conditions. Good for batch correction and donor variability analysis.
scATAC-seq
10x PBMC scATAC 5k FreePublished~200 MB
Standard scATAC-seq PBMC benchmark. Includes fragments file, peak matrix, and per-barcode summary. Used in ArchR and Signac tutorials.
CITE-seq (RNA + Surface Proteins)
10x PBMC 10k Multiome CITE-seq FreePublished~800 MB
Simultaneous RNA and surface protein measurement. Includes CD3, CD4, CD8, CD14, CD19, CD56, PD-1, and more. Ideal for WNN analysis.
Spatial Transcriptomics
10x Visium Human Lymph Node FreePublished~350 MB
The most widely used Visium benchmark. Includes H&E image, spot coordinates, and gene expression matrix. Germinal centers and T cell zones clearly resolved.
10x Visium Human PBMC FreePublished~280 MB
Spatial transcriptomics of PBMCs directly. Useful for matching spatial and scRNA-seq data from the same cell type.
Multi-omics Integration
10x Multiome PBMC (RNA + ATAC) FreePublished~1.5 GB
The gold standard for multi-omics integration. Same cells profiled for gene expression AND chromatin accessibility simultaneously. Perfect for WNN and MOFA+ tutorials.
Gut Metagenomics
HMP2 — Human Microbiome Project 2 FreePublishedVaries
The largest published multi-omics gut microbiome study. Includes metagenomics, metatranscriptomics, metabolomics, and proteomics from IBD patients and healthy controls.
How to Use Real Data with OmicsAgent.ai
scRNA-seq (10x PBMC 3k)
# 1. Download and extract
wget https://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz
tar -xzf pbmc3k_filtered_gene_bc_matrices.tar.gz
# 2. Run via chat mode
python3 omics_agent.py --chat
You: Run scRNA-seq QC, normalization, clustering and cell type
annotation on my 10x PBMC data at
filtered_gene_bc_matrices/hg19/
Visium Spatial (Human Lymph Node)
# 1. Download matrix and spatial files
wget https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Human_Lymph_Node/V1_Human_Lymph_Node_filtered_feature_bc_matrix.h5
wget https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Human_Lymph_Node/V1_Human_Lymph_Node_spatial.tar.gz
tar -xzf V1_Human_Lymph_Node_spatial.tar.gz
# 2. Run spatial analysis
python3 omics_agent.py --chat
You: Analyze my Visium human lymph node data, find spatially
variable genes, and identify tissue domains
Multiome RNA+ATAC Integration
# 1. Download both modalities
wget https://cf.10xgenomics.com/samples/cell-arc/2.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_filtered_feature_bc_matrix.h5
wget https://cf.10xgenomics.com/samples/cell-arc/2.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_atac_fragments.tsv.gz
# 2. Run integration
python3 omics_agent.py --chat
You: Run multi-omics integration on my 10x Multiome PBMC data
combining RNA and ATAC modalities with WNN and MOFA+
Expected Results — Biological Findings
| Skill | Dataset | Key Finding |
|---|---|---|
| scRNA-seq | PBMC 3k/68k | 10 cell types: CD4 T, CD8 T, NK, B, Mono Classical, Mono Non-classical, mDC, Plasmablast, Treg, Exhausted CD8 T |
| scATAC-seq | PBMC 5k ATAC | SPI1/CEBPA motifs drive Monocyte chromatin; TOX/NR4A1 mark Exhausted CD8 T cells |
| CITE-seq | PBMC 10k TotalSeq | CD14+/HLA-DR+ marks Classical Monocytes; PD-1/LAG-3/TIM-3 co-express on Exhausted CD8 T |
| Spatial | Visium Lymph Node | BCL6/AID in Germinal Center; COL1A1 in Capsule; CCR7/CCL19 in T cell zone |
| Integration | 10x Multiome | RNA-ATAC WNN reveals regulatory programs invisible to either modality alone |
| Metagenomics | HMP2/IBD | Faecalibacterium prausnitzii depleted in IBD; Ruminococcus gnavus expanded |
sha256sum -c checksums.sha256 to verify results are identical.