The intelligence layer
for your
single-cell omics data

CyteType deploys specialized AI reviewers to annotate each cluster with ontology-mapped labels, marker-level evidence, functional state resolution, and confidence-scored quality control.

Annotation intelligence at production scale

100,000+ clusters annotated
99.99% completion rate
Days not weeks, to audit-ready

Every cluster passes through specialized AI reviewers that trace each annotation to marker-level evidence, ontology-mapped labels, and confidence-scored quality control, giving your team defensible calls at production scale.

Where CyteType delivers

When annotation speed, consistency, and traceability break down, programs stall. CyteType surfaces the biological problem first, then provides an evidence-backed path to a defensible call.

Consortium-scale cell atlas annotation

Site-to-site labeling drift makes consortium atlases hard to align and harder to defend. Ontology-mapped calls with explicit evidence and confidence scoring enforce one reviewable language across studies.

What's in a CyteType report

Every section answers a question your biology team will ask.

Ontology-anchored annotation

Each cluster is mapped to a Cell Ontology term with confidence and label match scores, plus a direct CL reference so the definition is explicit and reviewable.

Validated benchmarks

Multi-agent AI · Full expression profiles · Evidence-grounded reasoning

Up to 388% higher annotation accuracy
16 LLMs tested across model families
Up to 300% improvement over existing methods

Annotation score across methods

CyteType across LLMs

Overall Similarity Score GTExV9 HypoMap Immune Cell Atlas Mouse Pancreatic % Missing Avg Runtime Confidence Majority
SingleR
CellTypist
GPTCellType (GPT-5)
CyteType configured with different LLMs
Claude Sonnet 4 (C)
GPT-5 (C)
Gemini 2.5 Pro (C)
GPT-4.1 (C)
Kimi K2 (O)
GLM 4.5 (O)
LLaMA 4 Maverick (O)
DeepSeek R1 (O)
Magistral Medium 2506 (O)
Grok 4 (C)
Qwen3 235B A22B Thinking (O)
GPT-OSS 120B (O)
Gemini 2.5 Flash (C)
Qwen3 235B A22B (O)
Minimax M1 (O)
Qwen3 30B A3B Thinking (O)
(O) = Open weight LLM (C) = Closed weight LLM
Datasets Resource Reliability

Performance improves up to 300% over existing methods, orders of magnitude beyond the typical 10–20% gains seen across the field. Even open-weight models like DeepSeek R1 and Qwen3 reach 95% of peak performance. The breakthrough is in structured reasoning, not prompting at scale — moving single-cell annotation from guesswork to interpretable, evidence-based classification.

Read the benchmarking study on bioRxiv

How CyteType compares

Pharma teams need more than accuracy from an annotation tool. They need evidence they can defend, terminology that stays consistent across partners, and infrastructure that passes security review.

Evaluation Criteria CyteType SingleR scType CellTypist scANVI Azimuth
Can We Deploy This?
On-premise / air-gapped deploymentInfoSec and data governance clearance Local model hosting; AWS Bedrock supported Runs locally in R PartialR scripts local; web tool is external Runs locally in Python Runs locally; requires GPU PartialDocker available; web app is cloud-hosted
Pipeline integration (Python + R)Drop-in to existing Scanpy/Seurat workflows Bothpip install (AnnData) + CyteTypeR (Seurat) R onlyBioconductor package R onlyR scripts; no native Python Python onlyPyPI package Primarily PythonR via reticulate bridge Primarily RAccepts h5ad via web upload
GPU infrastructure requiredIT procurement and compute cost implications No No No No Effectively yesDocumented limitation of the method No
Commercial licensingLegal and procurement clarity Explicit commercial licenseOpen-source for academia; commercial terms for industry Open source (GPL-3); copyleft implicationsSame as scType and Azimuth now. Three of the five competitors carry copyleft obligations, which strengthens CyteType's positioning on licensing clarity. Open sourceGPL v3; copyleft implications Open sourceMIT-style Open sourceBSD 3-Clause Open sourceGPL-3; copyleft implications
ReproducibilityRegulatory and QC requirement: same input, same output Deterministic Deterministic Deterministic Deterministic PartialStochastic training; seed-dependent Deterministic
Does It Reduce Risk in Our Decision-Making?
Reference data dependencyOperational overhead of curating and maintaining reference datasets per project Not requiredOperates from expression data; accepts marker genes to guide annotation RequiredAccepts any custom labelled dataset Not requiredBuilt-in marker DB; custom markers via XLSX RequiredUser-trained custom models supported RequiredAccepts any labelled dataset for training RequiredLimited to HuBMAP curated references only
Evidence trail per annotationDefend calls in target review, regulatory, and cross-functional settings Full reasoning chain: markers, literature, reviewer assessment
Cell Ontology standardisationConsistent terminology across projects, sites, and CRO partners Automatic CL ID assignment per annotation PartialCL IDs in encyclopedia; not in annotation output by default
Disease context handlingTME, inflammatory tissue, and disease-state datasets Adapts reasoning to disease vs. healthy contexts natively PartialOnly if reference contains disease labels PartialSNV calling distinguishes malignant vs. healthy PartialOnly if model trained on disease data PartialOnly if reference contains disease labels Training data explicitly excludes cancer
Functional state resolutionDistinguish actionable cell states from coarse labels Activation, exhaustion, polarization with marker-level support Constrained by reference label granularity Constrained by database categories PartialGranularity depends on model used Constrained by training labels PartialMulti-level hierarchy (L1/L2) available
Cluster-level interrogationQuery the biology behind each annotation call directly Chat interface per cluster: ask biological questions, explore evidence

Built to hold up in the real world

LLM-driven annotation fails without reliability, privacy, and scale. CyteType is built for those constraints.

Defensible labels

Ontology IDs, evidence trails, and reviewer rationale on every call.

Production LLM stack

Hundreds of calls per cluster with retries and health-aware fallbacks, built to finish at scale.

Enterprise ready

Cloud pilots now; on-prem for pharma-run LLMs, zero retention, no training use, isolated storage.

Fits your stack

Scanpy, Seurat, and AnnData supported via the CyteType Python and R packages.

Benchmarked

Tested against CellTypist, SingleR, and GPTCellType across four datasets and sixteen LLMs.

Trusted by researchers from

Memorial Sloan Kettering Cancer Center
Mass General Brigham
Institut Pasteur
University of Oxford
University of Cambridge
Helmholtz Munich

See CyteType on your data

Leave your details and our team will arrange a session where you can bring a dataset and walk away with a full annotation report.