Public single-cell RNA-seq databases worth using in 2026
An updated guide to the most useful public single-cell RNA-seq databases in 2026, including archives, atlas portals, and domain-specific resources for data discovery and reuse.
Public single-cell RNA-seq databases worth using in 2026
Single-cell RNA-seq has reached a point where the problem is no longer data scarcity. Public datasets are abundant. The harder problem is knowing which resource is appropriate for the biological question, and which one will leave you with weeks of extra processing before you can make use of the data.
That distinction matters. Some resources are archives. They are critical for data deposition, raw file access, and reproducibility, but they are not designed to make cross-study exploration easy. Others are atlas portals or curated discovery layers that standardize metadata, annotations, and interfaces so researchers can search, inspect, compare, and reuse public data more efficiently.
If your goal is to find a deposited dataset, a primary archive is usually the right place to start. If your goal is to benchmark a finding, inspect a cell state, compare tissues, or build a reference set, a curated portal will often get you there faster.
This guide focuses on public resources that remain genuinely useful in 2026 for finding, exploring, downloading, and reusing single-cell transcriptomic data.
A practical way to think about public scRNA-seq resources
It helps to separate public single-cell resources into three broad groups.
1. Primary archives
These are repositories of record. They preserve submitted studies, raw data, processed matrices, and metadata. They are indispensable for reproducibility and reanalysis, but they usually do not solve the downstream problems of metadata harmonization, annotation consistency, or cross-study comparability.
Examples include GEO, SRA, and ArrayExpress.
2. Curated discovery portals
These resources repackage public studies into more analysis-friendly collections. They usually support interactive browsing, standardized metadata, and simpler data reuse. In practice, these are often the best entry point for researchers who want to inspect biology before committing to a heavier workflow.
Examples include CELLxGENE Discover, CELLxGENE Census, Single Cell Expression Atlas, and Single Cell Portal.
3. Domain-specific atlases and specialist databases
These are built around a tissue, disease context, or biological use case. They are often the most informative option when the question is narrow and biologically specific.
Examples include TISCH2 for tumor microenvironment analysis, Allen Brain Cell Types for neurobiology, and Tabula Sapiens for a healthy human multi-organ reference.
Public single-cell RNA-seq databases and portals
| Resource | Best for | What it offers | Main limitation | Link |
|---|---|---|---|---|
| GEO | Deposited studies and processed expression matrices | Large public functional genomics archive with broad study coverage and easy accession-based retrieval | Metadata quality and processing conventions vary substantially by study | Visit GEO |
| SRA | Raw sequencing data | Access to raw sequencing files for complete reprocessing and pipeline-level control | Requires substantial downstream processing and metadata handling before analysis | Visit SRA |
| ArrayExpress | Archived functional genomics studies in the EBI ecosystem | Stores study-level metadata and processed data, with links into the ENA ecosystem for raw sequencing files | More useful as an archive than as a harmonized single-cell exploration layer | Visit ArrayExpress |
| Human Cell Atlas Data Portal | Large-scale human reference data | Community-generated open atlas covering human tissues across many projects and donors | Breadth is high, but downstream reuse still depends on study context and metadata completeness | Visit HCA Data Portal |
| CELLxGENE Discover | Fast interactive exploration of public single-cell data | Searchable curated portal for browsing datasets, genes, tissues, and cell types with a strong user interface | Does not aim to be a universal archive of all published single-cell studies | Visit CELLxGENE Discover |
| CELLxGENE Census | Programmatic reuse at scale | Computational access layer for curated public single-cell data, designed for slicing and reuse in common analysis environments | Best suited to users comfortable with computational workflows | Visit CELLxGENE Census |
| Single Cell Expression Atlas | Uniformly processed cross-study and cross-species comparison | Standardized single-cell resource with consistent processing and ontology-aware metadata | Smaller in scope than raw archives or some large atlas portals | Visit Single Cell Expression Atlas |
| Single Cell Portal | Study browsing, sharing, and interactive visualization | Broad Institute portal for exploring public studies and visualizing single-cell data in an accessible way | Standardization depends partly on the submitted study and portal context | Visit Single Cell Portal |
| PanglaoDB | Marker lookup and quick exploration in human and mouse | Longstanding resource for marker-focused browsing of public single-cell data | Less suitable than newer portals for large-scale harmonized comparative analysis | Visit PanglaoDB |
| TISCH2 | Tumor microenvironment analysis | Curated database focused on tumor microenvironment single-cell datasets across cancer types | Cancer-specific by design, so it is not a general-purpose atlas | Visit TISCH2 |
| Allen Brain Cell Types | Brain cell taxonomy and neurobiology | High-value single-cell and single-nucleus reference data for nervous system research | Narrow biological focus outside neurobiology use cases | Visit Allen Brain Cell Types |
| Tabula Sapiens | Healthy human cross-organ reference | Multi-organ healthy human atlas useful as a broad reference for baseline cell states | Better used as a reference atlas than as a continuously expanding public study browser | Visit Tabula Sapiens |
| Bgee | Cross-species expression in healthy conditions | Comparative expression resource spanning multiple species with curated healthy wild-type context | Not designed as a general-purpose single-cell study browser | Visit Bgee |
| Human Protein Atlas, single-cell section | Gene-centric inspection with transcript and protein context | Useful for linking transcript-level observations to broader tissue and protein context | Best for gene-level exploration, not full cohort-scale single-cell analysis | Visit Human Protein Atlas |
Which database should you use?
There is no single best database. The right one depends on the job.
If you need the original deposited study, start with GEO, SRA, or ArrayExpress. These are usually the best places to retrieve accession-linked records, raw sequencing files, supplementary metadata, or author-submitted processed matrices.
If you need rapid exploration of public single-cell data, especially at the stage where you are checking whether a tissue, gene, cell type, or disease context is already represented, CELLxGENE Discover, Single Cell Expression Atlas, and Single Cell Portal are often more efficient. They reduce the friction between finding a study and seeing the biology.
If you need a highly specific reference, specialist resources are usually the better choice. TISCH2 is particularly useful for tumor microenvironment work. Allen Brain Cell Types is strong for neurobiology and brain taxonomy. Tabula Sapiens is a useful healthy human baseline. Bgee becomes valuable when the question crosses species and healthy expression context matters.
What still limits public scRNA-seq databases?
The main bottleneck is not scale. It is comparability.
Even when a resource is large and widely used, the biological value still depends on metadata quality, processing assumptions, and annotation consistency. Cell type labels are not always standardized across studies. Disease states may be captured unevenly. Tissue and anatomical descriptors can vary in granularity. Donor-level metadata is often incomplete. These issues become serious as soon as researchers try to compare studies rather than inspect them one by one.
In practice, four limitations still show up repeatedly.
Metadata remains uneven
Public availability does not guarantee that a dataset is richly or consistently described. Missing or inconsistent metadata can make cross-study filtering difficult, especially when the question depends on disease state, treatment context, donor attributes, or fine-grained anatomy.
Uniform reprocessing is not universal
Primary archives preserve the study as submitted. That is necessary, but it does not make datasets directly comparable. Curated portals become more useful when they apply standardized workflows, stable ontologies, and reproducible access layers.
Breadth and depth rarely live in the same place
Broad portals are useful for discovery. Specialist resources are useful for depth. Very few databases do both equally well. A sensible workflow often moves from a broad discovery layer into a narrower disease or tissue-specific atlas.
Public access does not mean analysis-ready
A dataset may be downloadable and still be difficult to reuse. File formats, incomplete labels, study-specific processing, and fragmented metadata still slow down validation, integration, and comparative analysis.
What the stronger resources now do well
The most useful public resources in 2026 share a few traits.
They expose structured metadata. They support interactive browsing. They make it easier to move from search to biological inspection without rebuilding the analysis stack from scratch. Increasingly, the strongest platforms also separate archival storage from analysis-oriented access, which makes them much more practical for researchers who need to ask and answer biological questions rather than only retrieve files.
This is the direction the field has been moving toward. The difference between a public repository and a usable scientific resource is no longer just scale. It is whether the interface, metadata, and access model support real downstream interpretation.
A practical workflow for researchers
A good workflow usually looks something like this:
- Start broad with a portal such as Human Cell Atlas, CELLxGENE Discover, Single Cell Expression Atlas, or Single Cell Portal to determine whether the biology is represented.
- Move into a specialist database if the question is domain-specific, for example TISCH2 for tumor microenvironment questions or Allen Brain Cell Types for neurobiology.
- Return to GEO, SRA, or ArrayExpress when you need original files, accession-linked provenance, or full reprocessing.
- Only then commit to integration or benchmarking, once the metadata, preprocessing assumptions, and biological context are compatible with the comparison you want to make.
Final thoughts
Public single-cell RNA-seq databases have become foundational infrastructure for modern single-cell biology. But they are not interchangeable.
If the goal is reproducibility, start from the archive layer. If the goal is discovery, start from a curated portal. If the goal is a precise biological question, use the specialist atlas built for that problem.
The quality of the downstream insight usually depends less on how many cells a database contains, and more on whether the resource matches the actual analytical question.
See public data in a workflow built for interpretation
Public single-cell data is only useful if you can search it, inspect it, compare it, and turn it into a biological conclusion without losing context along the way.
Nygen gives researchers a practical way to work from both public and private single-cell datasets in one environment. Explore reference data, inspect cell states, compare studies, and move from scattered datasets to interpretable biology in a workflow designed for scientific use.