# Pegasus Plotting Tutorial¶

Author: Hui Ma, Yiming Yang, Rimte Rocher
Date: 2022-03-09
Notebook Source: plotting_tutorial.ipynb

For this plotting tutorial, we provide an analysis result of gene-count matrix dataset on Human Bone Marrow with 8 donors. You can get the data from https://storage.googleapis.com/terra-featured-workspaces/Cumulus/MantonBM_result.zarr.zip, or use gsutil to download via its Google bucket URL (gs://terra-featured-workspace/Cumulus/MantonBM_result.zarr.zip):

After downloading, load the file using Pegasus read_input function:

In the following sections, we'll cover Pegasus plotting functions using this dataset. Moreover, for gene plots, the canonical gene markers below will be used:

• B cells and Plasma cells: CD38, JCHAIN, CD79A, MS4A1.
• T cells: TRAC, CD3D, CCR7.
• Cytotoxic T cells: CD8A, CD8B.
• NK cells: NKG7.
• Monocytes: CD14, FCGR3A.
• Erythroid cells: GYPA.
• Hematopoietic Stem cells: CD34, SELL.
• Dendritic cells: HLA-DPA1, CD4.

## QC Violin Plot¶

The first step in preprocessing is to perform the quality control analysis, and remove cells and genes of low quality.

pg.qcviolin shows the effect of quality control more intuitively by presenting the violin plot of cell distribution before and after filtration.

plot_type='gene' shows the number of expressed cells before and after filtration.

Quality control stats on number of percentage of mitochondrial genes:

The number of UMIs before and after filtration is also an important aspect of quality control.

## Highly Variable Feature Plot¶

Highly Variable Genes (HVG) are more likely to convey information discriminating different cell types and states. Thus, rather than considering all genes, people usually focus on selected HVGs for downstream analyses.

Pegasus provides hvfplot function to generate a scatterplot of genes upon HVG selection. This plot only works for Pegasus-flavor HVGs (i.e. flavor='pegasus' in Pegasus highly_variable_features function).

After selecting 2000 HVGs using the Pegasus selection method, the plot below is generated. Each point stands for one gene. Blue points are selected to be HVGs, which account for the majority of variation of the dataset. By default, it prints labels of 20 top HVGs. You can change this number in top_n parameter.

## Composition Plot¶

Composition plot is a bar plot showing the cell compositions (under different conditions) in each cluster. Below is to show the composition of different samples in each Louvain cluster:

Composition plot is useful to fast assess library quality and batch effects.

## Scatter Plot¶

Scatter plot requires at least 2 parameters

• data – Gene-count matrix to show.
• basis – Cell embedding to show. Can be either 'umap', 'tsne', ‘fle’, ‘net_umap’ or ‘net_fle’.

For this demonstration, we select annotation and channel as data attributes, and tsne as basis.