Merging and Demultiplexing Datasets

Learn how to easily merge datasets and demultiplex HTO samples, preserving genes and metadata across analyses.

Merging Datasets

On the Nygen Analytics platform, you can create a new merged dataset from multiple datasets in your project .

Here are some examples of where the dataset merging function can be used for:

  • when you have multiple file uploads or datasets that you would like to analyse as one dataset
  • combine public datasets from a publication with your own datasets for analysis

Merging datasets with non-identical gene sets:

When datasets come from different experiments, labs, or time periods, they may contain different sets of genes or features. During the merging process on our platform, we preserve all unique genes from both datasets. For any genes that are present in one dataset but missing in the other, we automatically fill in the gaps with zeros.

Merging datasets with different modalities:

You can combine datasets that utilized different techniques or assays, some examples can be:

  • Merging a standard RNA-seq dataset with a CITE-seq dataset
  • Combining an RNA + HTO dataset with an RNA + ADT dataset

Similarly with merging datasets with missing genes, we use a dummy data of zeroes to fill in the missing assay data.

Steps

1. From your projects page, use the checkbox to select the datasets you would like to merge.

2. You can add or remove datasets before confirming the merge. The original datasets used for the merge will still be kept in your project when you use this function. If you want to remove the individual files that were sourced for the merged dataset, you can choose to delete them and it will not affect the merged dataset.

3. The new merged dataset will appear in your Inbox. This can take as little as a few minutes depending on the number of cells in your dataset.

Demultiplexing datasets

For experiments using cell hashing also hashtag oligos (HTOs), our platform allows you to split a single multiplexed file of different samples into separate datasets. The demultiplexing process uses the same methodology as Seurat's HTODemux function. You can review the specific implementation in our GitHub repository: https://github.com/parashardhapola/scarf/blob/master/scarf/feat_utils.py#L95

Important to know:

  • The 'Split dataset' option is only available for dataset that includes HTOs
  • For detailed instructions on uploading files with HTOs, please refer to the 'Uploading multiple modalities' section in our documentation: Count Matrix Upload Process

Steps

1. If you have uploaded or received datasets with HTO, you can demultiplex the dataset on the platform. The original dataset will still be available to you after demultiplexing, you can delete the dataset after splitting without affecting the new demultiplexed datasets.

2. From the more options button, you can also analyse the dataset without demultiplexing.

3. You will get an overview of how the dataset will be split before starting the demultiplex

4. Once the demultiplex starts, the new split datasets will be imported into your Inbox. The option to split dataset will no longer be available on the original.

💡Tip #1: Add metadata before merging, as it carries over, and view source datasets in the ‘Group by’ dropdown.
💡Tip #2: Check for batch effects after merging datasets and refer to the Data Integration & Batch Correction section for guidance.
💡Tip #3: Filtered cells will be included in the merged dataset, so make sure to extract them beforehand if needed.

Yi Su

Bioinfomatician