Doublet Detection Tutorial

Author: Hui Ma, Yiming Yang, Rimte Rocher
Date: 2022-03-09
Notebook Source: doublet_detection.ipynb

Dataset

In this tutorial, we'll use the output result of Pegasus Tutorial to demonstrate how to detect and remove doublet cells in Pegasus. The dataset consists of human bone marrow single cells from 8 donors.

The dataset is stored at https://storage.googleapis.com/terra-featured-workspaces/Cumulus/MantonBM_result.zarr.zip. You can also use gsutil to download it via its Google bucket URL (gs://terra-featured-workspaces/Cumulus/MantonBM_result.zarr.zip).

Now load the count matrix:

Sections

Detect Doublets

In this step, infer doublets per channel. Set clust_attr = 'anno' to see the doublet density in each cluster and infer doublet cluster.

The method used for detecting doublets can be found here.

Here, plot annotation and Scrublet-like doublet score.

We also want to see the doublet percentage of each cluster to decide if there is a doublet cluster.

All clusters have doublet percentage under 5%, so no need to mark any doublet clusters here. If any cluster has doublet percentage more than $50\%$, we can consider to mark it as doublet cluster.

For example, If we want to mark 'CD14+ Monocyte' and 'CD14+ Monocyte-2' as doublet clusters, use the following code:

pg.mark_doublets(data, dbl_clusts = 'anno:CD14+ Monocyte,CD14+ Monocyte-2')

The mark_doublets function will mark doublet cluster (if any), and write singlet/doublet assignment to the "demux_type" column attribute in data.obs. The "demux_type" attribute is also used for singlet/doublet assignment of cell hashing, nucleus hashing and genetics pooling data (see documentation).

For this demonstration dataset, among $35,465$ cells, $724$ doublets detected. Doublet rate is $2.04\%$:

Doublets distribution can be better observed in UMAP plot:

Remove Doublets and Recluster

Start the reclustering process from re-selecting highly variable genes. Batch effect is observed, so we also want to use harmony algorithm to correct bach effect for reclustering.

Re-annotate:

Umap of annotation after re-clustering: