Public single-cell RNA-seq databases worth using in 2026

Single-cell RNA-seq has reached a point where the problem is no longer data scarcity. Public datasets are abundant. The harder problem is knowing which resource is appropriate for the biological question, and which one will leave you with weeks of extra processing before you can make use of the data.

That distinction matters. Some resources are archives. They are critical for data deposition, raw file access, and reproducibility, but they are not designed to make cross-study exploration easy. Others are atlas portals or curated discovery layers that standardize metadata, annotations, and interfaces so researchers can search, inspect, compare, and reuse public data more efficiently.

If your goal is to find a deposited dataset, a primary archive is usually the right place to start. If your goal is to benchmark a finding, inspect a cell state, compare tissues, or build a reference set, a curated portal will often get you there faster.

This guide focuses on public resources that remain genuinely useful in 2026 for finding, exploring, downloading, and reusing single-cell transcriptomic data.

A practical way to think about public scRNA-seq resources

It helps to separate public single-cell resources into three broad groups.

1. Primary archives

These are repositories of record. They preserve submitted studies, raw data, processed matrices, and metadata. They are indispensable for reproducibility and reanalysis, but they usually do not solve the downstream problems of metadata harmonization, annotation consistency, or cross-study comparability.

Examples include GEO, SRA, and ArrayExpress.

2. Curated discovery portals

These resources repackage public studies into more analysis-friendly collections. They usually support interactive browsing, standardized metadata, and simpler data reuse. In practice, these are often the best entry point for researchers who want to inspect biology before committing to a heavier workflow.

Examples include CELLxGENE Discover, CELLxGENE Census, Single Cell Expression Atlas, and Single Cell Portal.

3. Domain-specific atlases and specialist databases

These are built around a tissue, disease context, or biological use case. They are often the most informative option when the question is narrow and biologically specific.

Examples include TISCH2 for tumor microenvironment analysis, Allen Brain Cell Types for neurobiology, and Tabula Sapiens for a healthy human multi-organ reference.

Public single-cell RNA-seq databases and portals

Resource	Best for	What it offers	Main limitation	Link
GEO	Deposited studies and processed expression matrices	Large public functional genomics archive with broad study coverage and easy accession-based retrieval	Metadata quality and processing conventions vary substantially by study	Visit GEO
SRA	Raw sequencing data	Access to raw sequencing files for complete reprocessing and pipeline-level control	Requires substantial downstream processing and metadata handling before analysis	Visit SRA
ArrayExpress	Archived functional genomics studies in the EBI ecosystem	Stores study-level metadata and processed data, with links into the ENA ecosystem for raw sequencing files	More useful as an archive than as a harmonized single-cell exploration layer	Visit ArrayExpress
Human Cell Atlas Data Portal	Large-scale human reference data	Community-generated open atlas covering human tissues across many projects and donors	Breadth is high, but downstream reuse still depends on study context and metadata completeness	Visit HCA Data Portal
CELLxGENE Discover	Fast interactive exploration of public single-cell data	Searchable curated portal for browsing datasets, genes, tissues, and cell types with a strong user interface	Does not aim to be a universal archive of all published single-cell studies	Visit CELLxGENE Discover
CELLxGENE Census	Programmatic reuse at scale	Computational access layer for curated public single-cell data, designed for slicing and reuse in common analysis environments	Best suited to users comfortable with computational workflows	Visit CELLxGENE Census
Single Cell Expression Atlas	Uniformly processed cross-study and cross-species comparison	Standardized single-cell resource with consistent processing and ontology-aware metadata	Smaller in scope than raw archives or some large atlas portals	Visit Single Cell Expression Atlas
Single Cell Portal	Study browsing, sharing, and interactive visualization	Broad Institute portal for exploring public studies and visualizing single-cell data in an accessible way	Standardization depends partly on the submitted study and portal context	Visit Single Cell Portal
PanglaoDB	Marker lookup and quick exploration in human and mouse	Longstanding resource for marker-focused browsing of public single-cell data	Less suitable than newer portals for large-scale harmonized comparative analysis	Visit PanglaoDB
TISCH2	Tumor microenvironment analysis	Curated database focused on tumor microenvironment single-cell datasets across cancer types	Cancer-specific by design, so it is not a general-purpose atlas	Visit TISCH2
Allen Brain Cell Types	Brain cell taxonomy and neurobiology	High-value single-cell and single-nucleus reference data for nervous system research	Narrow biological focus outside neurobiology use cases	Visit Allen Brain Cell Types
Tabula Sapiens	Healthy human cross-organ reference	Multi-organ healthy human atlas useful as a broad reference for baseline cell states	Better used as a reference atlas than as a continuously expanding public study browser	Visit Tabula Sapiens
Bgee	Cross-species expression in healthy conditions	Comparative expression resource spanning multiple species with curated healthy wild-type context	Not designed as a general-purpose single-cell study browser	Visit Bgee
Human Protein Atlas, single-cell section	Gene-centric inspection with transcript and protein context	Useful for linking transcript-level observations to broader tissue and protein context	Best for gene-level exploration, not full cohort-scale single-cell analysis	Visit Human Protein Atlas

Which database should you use?

There is no single best database. The right one depends on the job.

If you need the original deposited study, start with GEO, SRA, or ArrayExpress. These are usually the best places to retrieve accession-linked records, raw sequencing files, supplementary metadata, or author-submitted processed matrices.

If you need rapid exploration of public single-cell data, especially at the stage where you are checking whether a tissue, gene, cell type, or disease context is already represented, CELLxGENE Discover, Single Cell Expression Atlas, and Single Cell Portal are often more efficient. They reduce the friction between finding a study and seeing the biology.

If you need a highly specific reference, specialist resources are usually the better choice. TISCH2 is particularly useful for tumor microenvironment work. Allen Brain Cell Types is strong for neurobiology and brain taxonomy. Tabula Sapiens is a useful healthy human baseline. Bgee becomes valuable when the question crosses species and healthy expression context matters.

What still limits public scRNA-seq databases?

The main bottleneck is not scale. It is comparability.

Even when a resource is large and widely used, the biological value still depends on metadata quality, processing assumptions, and annotation consistency. Cell type labels are not always standardized across studies. Disease states may be captured unevenly. Tissue and anatomical descriptors can vary in granularity. Donor-level metadata is often incomplete. These issues become serious as soon as researchers try to compare studies rather than inspect them one by one.

In practice, four limitations still show up repeatedly.

Metadata remains uneven

Public availability does not guarantee that a dataset is richly or consistently described. Missing or inconsistent metadata can make cross-study filtering difficult, especially when the question depends on disease state, treatment context, donor attributes, or fine-grained anatomy.

Uniform reprocessing is not universal

Primary archives preserve the study as submitted. That is necessary, but it does not make datasets directly comparable. Curated portals become more useful when they apply standardized workflows, stable ontologies, and reproducible access layers.

Breadth and depth rarely live in the same place

Broad portals are useful for discovery. Specialist resources are useful for depth. Very few databases do both equally well. A sensible workflow often moves from a broad discovery layer into a narrower disease or tissue-specific atlas.

Public access does not mean analysis-ready

A dataset may be downloadable and still be difficult to reuse. File formats, incomplete labels, study-specific processing, and fragmented metadata still slow down validation, integration, and comparative analysis.

What the stronger resources now do well

The most useful public resources in 2026 share a few traits.

They expose structured metadata. They support interactive browsing. They make it easier to move from search to biological inspection without rebuilding the analysis stack from scratch. Increasingly, the strongest platforms also separate archival storage from analysis-oriented access, which makes them much more practical for researchers who need to ask and answer biological questions rather than only retrieve files.

This is the direction the field has been moving toward. The difference between a public repository and a usable scientific resource is no longer just scale. It is whether the interface, metadata, and access model support real downstream interpretation.

A practical workflow for researchers

A good workflow usually looks something like this:

Start broad with a portal such as Human Cell Atlas, CELLxGENE Discover, Single Cell Expression Atlas, or Single Cell Portal to determine whether the biology is represented.
Move into a specialist database if the question is domain-specific, for example TISCH2 for tumor microenvironment questions or Allen Brain Cell Types for neurobiology.
Return to GEO, SRA, or ArrayExpress when you need original files, accession-linked provenance, or full reprocessing.
Only then commit to integration or benchmarking, once the metadata, preprocessing assumptions, and biological context are compatible with the comparison you want to make.

Final thoughts

Public single-cell RNA-seq databases have become foundational infrastructure for modern single-cell biology. But they are not interchangeable.

If the goal is reproducibility, start from the archive layer. If the goal is discovery, start from a curated portal. If the goal is a precise biological question, use the specialist atlas built for that problem.

The quality of the downstream insight usually depends less on how many cells a database contains, and more on whether the resource matches the actual analytical question.

See public data in a workflow built for interpretation

Public single-cell data is only useful if you can search it, inspect it, compare it, and turn it into a biological conclusion without losing context along the way.

Nygen gives researchers a practical way to work from both public and private single-cell datasets in one environment. Explore reference data, inspect cell states, compare studies, and move from scattered datasets to interpretable biology in a workflow designed for scientific use.

Explore ScarfWeb