CyteType Documentation

Everything you need to annotate single-cell data with CyteType — from installation to interpreting your report.

← CyteType

Documentation

Guides, references, and walkthroughs for CyteType annotation workflows.

Prerequisites

Before running CyteType, make sure your data meets these requirements.

Data requirements:

  • Gene symbols, not Ensembl IDs, in your feature names
  • Differential expression results computed per cluster
  • Clustering results stored in your object (Leiden, Louvain, or Seurat clusters)
  • Normalized gene expression data (log1p-normalization recommended)

Python prerequisites:

  • A preprocessed AnnData object with sc.tl.rank_genes_groups results
  • Python ≥ 3.12

R prerequisites:

  • devtools installed for GitHub installation
  • A Seurat object with FindAllMarkers() output
  • R ≥ 4.1.0

Installation

Install the CyteType client for your preferred environment.

pip install cytetype
install.packages("devtools")
library(devtools)
install_github("NygenAnalytics/CyteTypeR")

Quick Start

A minimal end-to-end run. No API key required for the default configuration.

import scanpy as sc
from cytetype import CyteType

# Load your preprocessed AnnData
# adata must have clusters in adata.obs and rank_genes_groups in adata.uns
group_key = "leiden"

annotator = CyteType(
    adata,
    group_key=group_key,
    rank_key="rank_genes_" + group_key,
    n_top_genes=50,
)

adata = annotator.run(
    study_context="Human PBMC from healthy donor, 10X Genomics 3' scRNA-seq"
)

# Annotations are now in adata.obs
sc.pl.umap(adata, color=f"cytetype_annotation_{group_key}")
library(Seurat)
library(CyteTypeR)

# Find markers (if not already done)
pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE)

# Step 1: prepare data
prepped_data <- PrepareCyteTypeR(
  obj = pbmc,
  marker_table = pbmc.markers,
  group_key = "seurat_clusters",
  n_top_genes = 50,
  coordinates_key = "umap"
)

# Step 2: submit and annotate
pbmc <- CyteTypeR(
  obj = pbmc,
  prepped_data = prepped_data,
  study_context = "Human PBMC from healthy donor, 10X Genomics 3' scRNA-seq"
)

# Annotations are now in obj@meta.data
DimPlot(pbmc, group.by = "cytetype_seurat_clusters")

A link to your interactive HTML report is printed during the run. Results are also written directly back to your object.


Frequently Asked Questions

CyteType is a multi-agent AI system for automated annotation of single-cell RNA-seq data. Several agents independently evaluate marker expression, reference similarity, ontology structure, and literature context. Their outputs are merged into a final annotation with confidence scores and traceable reasoning. The full method is described in the CyteType preprint (Ahuja G et al., bioRxiv 2025).
Traditional approaches depend on a single reference or a set of marker genes. CyteType instead integrates several biological signals through a structured agent workflow, leading to higher robustness in rare, transitional, or disease-associated cell populations.
The performance gains originate from the workflow rather than the LLM tier. Each agent contributes a different biological perspective, and a reconciliation step produces a stable, evidence-supported annotation.
CyteType was created by Nygen Analytics, a research-focused biotech company in Sweden working on AI systems for single-cell omics.
CyteType is available in Python for AnnData/Scanpy workflows and in R (CyteTypeR) for Seurat.
Python: pip install cytetype (Python ≥ 3.11). R: devtools::install_github("NygenAnalytics/CyteTypeR").
AnnData and Seurat objects. Standard Scanpy or Seurat preprocessing workflows do not require reformatting.
CyteType requires internet connectivity for LLM-based annotation. Jobs up to approximately 500,000 cells per request are supported. Larger datasets are automatically batched.