import pegasus as pg
For this plotting tutorial, we provide an analysis result of gene-count matrix dataset on Human Bone Marrow with 8 donors. You can get the data from https://storage.googleapis.com/terra-featured-workspaces/Cumulus/MantonBM_result.zarr.zip, or use gsutil to download via its Google bucket URL (gs://terra-featured-workspace/Cumulus/MantonBM_result.zarr.zip):
After downloading, load the file using Pegasus
data = pg.read_input("MantonBM_result.zarr.zip") data
2022-03-09 00:07:00,912 - pegasusio.readwrite - INFO - zarr file 'MantonBM_result.zarr.zip' is loaded. 2022-03-09 00:07:00,913 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.85s.
MultimodalData object with 1 UnimodalData: 'GRCh38-rna' It currently binds to UnimodalData object GRCh38-rna UnimodalData object with n_obs x n_vars = 35465 x 25653 Genome: GRCh38; Modality: rna It contains 2 matrices: 'X', 'raw.X' It currently binds to matrix 'X' as X obs: 'n_genes', 'Channel', 'gender', 'n_counts', 'percent_mito', 'scale', 'louvain_labels'(cluster), 'anno' var: 'featureid', 'n_cells', 'percent_cells', 'robust', 'highly_variable_features', 'mean', 'var', 'hvf_loess', 'hvf_rank' obsm: 'X_diffmap', 'X_fle'(basis), 'X_pca', 'X_pca_harmony', 'X_phi', 'X_tsne'(basis), 'X_umap'(basis), 'diffmap_knn_distances'(knn), 'diffmap_knn_indices'(knn), 'pca_harmony_knn_distances'(knn), 'pca_harmony_knn_indices'(knn) varm: 'means', 'partial_sum', 'de_res' obsp: 'W_diffmap', 'W_pca_harmony' uns: 'genome', 'louvain_resolution', 'modality', 'norm_count', 'pca_features', 'stdzn_max_value', 'PCs', 'diffmap_evals', 'ncells', 'stdzn_mean', 'stdzn_std', '_attr2type', 'df_qcplot', 'pca'
In the following sections, we'll cover Pegasus plotting functions using this dataset. Moreover, for gene plots, the canonical gene markers below will be used:
marker_genes = ['CD38', 'JCHAIN', 'FCGR3A', 'HLA-DPA1', 'CD14', 'CD79A', 'MS4A1', 'CD34', 'TRAC', 'CD3D', 'CD8A', 'CD8B', 'GYPA', 'NKG7', 'CD4', 'SELL', 'CCR7']
pg.qcviolin shows the effect of quality control more intuitively by presenting the violin plot of cell distribution before and after filtration.
plot_type='gene' shows the number of expressed cells before and after filtration.
pg.qcviolin(data, plot_type='gene', dpi=100)
Quality control stats on number of percentage of mitochondrial genes:
pg.qcviolin(data, plot_type='mito', dpi=100)
The number of UMIs before and after filtration is also an important aspect of quality control.
pg.qcviolin(data, plot_type='count', dpi=100)
Highly Variable Genes (HVG) are more likely to convey information discriminating different cell types and states. Thus, rather than considering all genes, people usually focus on selected HVGs for downstream analyses.
hvfplot function to generate a scatterplot of genes upon HVG selection. This plot only works for Pegasus-flavor HVGs (i.e.
flavor='pegasus' in Pegasus
After selecting 2000 HVGs using the Pegasus selection method, the plot below is generated. Each point stands for one gene. Blue points are selected to be HVGs, which account for the majority of variation of the dataset. By default, it prints labels of 20 top HVGs. You can change this number in
Composition plot is a bar plot showing the cell compositions (under different conditions) in each cluster. Below is to show the composition of different samples in each Louvain cluster:
fig = pg.compo_plot(data, 'louvain_labels', 'Channel', style = 'frequency')
Composition plot is useful to fast assess library quality and batch effects.
For this demonstration, we select annotation and channel as data attributes, and tsne as basis.