import pegasus as pg


data = pg.read_input("MantonBM_nonmix_subset.zarr.zip")
data

2022-03-09 00:17:33,599 - pegasusio.readwrite - INFO - zarr file 'MantonBM_nonmix_subset.zarr.zip' is loaded.
2022-03-09 00:17:33,600 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.51s.

MultimodalData object with 1 UnimodalData: 'GRCh38-rna'
    It currently binds to UnimodalData object GRCh38-rna

UnimodalData object with n_obs x n_vars = 48219 x 36601
    Genome: GRCh38; Modality: rna
    It contains 1 matrix: 'X'
    It currently binds to matrix 'X' as X

    obs: 'n_genes', 'Channel', 'gender'
    var: 'featureid'
    uns: 'genome', 'modality'


data.X

<48219x36601 sparse matrix of type '<class 'numpy.int32'>'
	with 39997174 stored elements in Compressed Sparse Row format>


data.obs.head()


data.obs['Channel'].value_counts()

MantonBM6_HiSeq_1    6748
MantonBM8_HiSeq_1    6092
MantonBM4_HiSeq_1    6068
MantonBM7_HiSeq_1    6025
MantonBM5_HiSeq_1    5963
MantonBM2_HiSeq_1    5930
MantonBM1_HiSeq_1    5837
MantonBM3_HiSeq_1    5556
Name: Channel, dtype: int64


data.var.head()


data.uns['genome']

'GRCh38'


data.uns['modality']

'rna'


pg.qc_metrics(data, min_genes=500, max_genes=6000, mito_prefix='MT-', percent_mito=10)


df_qc = pg.get_filter_stats(data)
df_qc


pg.qcviolin(data, plot_type='gene', dpi=100)


pg.qcviolin(data, plot_type='count', dpi=100)


pg.qcviolin(data, plot_type='mito', dpi=100)


pg.filter_data(data)

2022-03-09 00:17:35,420 - pegasusio.qc_utils - INFO - After filtration, 35465 out of 48219 cell barcodes are kept in UnimodalData object GRCh38-rna.
2022-03-09 00:17:35,421 - pegasus.tools.preprocessing - INFO - Function 'filter_data' finished in 0.23s.


pg.identify_robust_genes(data)

2022-03-09 00:17:36,119 - pegasus.tools.preprocessing - INFO - After filtration, 25653/36601 genes are kept. Among 25653 genes, 17516 genes are robust.
2022-03-09 00:17:36,120 - pegasus.tools.preprocessing - INFO - Function 'identify_robust_genes' finished in 0.69s.


data.obs['Channel'].value_counts()

MantonBM2_HiSeq_1    4935
MantonBM6_HiSeq_1    4665
MantonBM8_HiSeq_1    4511
MantonBM7_HiSeq_1    4452
MantonBM1_HiSeq_1    4415
MantonBM3_HiSeq_1    4225
MantonBM4_HiSeq_1    4172
MantonBM5_HiSeq_1    4090
Name: Channel, dtype: int64


pg.log_norm(data)

2022-03-09 00:17:36,657 - pegasus.tools.preprocessing - INFO - Function 'log_norm' finished in 0.52s.


data_trial = data.copy()


pg.highly_variable_features(data_trial)

2022-03-09 00:17:37,460 - pegasus.tools.hvf_selection - INFO - Function 'estimate_feature_statistics' finished in 0.18s.
2022-03-09 00:17:37,494 - pegasus.tools.hvf_selection - INFO - 2000 highly variable features have been selected.
2022-03-09 00:17:37,495 - pegasus.tools.hvf_selection - INFO - Function 'highly_variable_features' finished in 0.21s.


data_trial.var.loc[data_trial.var['highly_variable_features']].sort_values(by='hvf_rank')


pg.hvfplot(data_trial, dpi=200)


pg.pca(data_trial)

2022-03-09 00:17:45,308 - pegasus.tools.preprocessing - INFO - Function 'pca' finished in 6.31s.


coord_pc1 = data_trial.uns['PCs'][:, 0]
coord_pc1

array([ 0.02221761,  0.01772111, -0.00582949, ..., -0.00050337,
        0.04850996,  0.03549923], dtype=float32)


data_trial.var.loc[data_trial.var['highly_variable_features']].index.values

array(['HES4', 'ISG15', 'TNFRSF18', ..., 'RPS4Y2', 'MT-CO1', 'MT-CO3'],
      dtype=object)


data_trial.obsm['X_pca'].shape

(35465, 50)


pg.neighbors(data_trial)

2022-03-09 00:17:50,066 - pegasus.tools.nearest_neighbors - INFO - Function 'get_neighbors' finished in 4.73s.
2022-03-09 00:17:51,043 - pegasus.tools.nearest_neighbors - INFO - Function 'calculate_affinity_matrix' finished in 0.98s.


print(f"Get {data_trial.obsm['pca_knn_indices'].shape[1]} nearest neighbors (excluding itself) for each cell.")
data_trial.obsm['pca_knn_indices']

Get 99 nearest neighbors (excluding itself) for each cell.

array([[30526, 27514, 28825, ..., 18019, 22273, 32915],
       [29723,  2651,  5282, ..., 29922, 14100, 30101],
       [35262, 20170, 30032, ..., 34146,  3880,  2412],
       ...,
       [ 6096, 30824, 18992, ..., 14345, 34251, 20801],
       [34252, 34709, 17569, ..., 32455,  8648,  6047],
       [ 5379, 35401, 31722, ...,  1585, 32585, 16009]])


data_trial.obsm['pca_knn_distances']

array([[ 4.984169 ,  5.486617 ,  5.531816 , ...,  6.568183 ,  6.570032 ,
         6.5835915],
       [ 8.275161 ,  8.679711 ,  8.974715 , ..., 10.486567 , 10.48813  ,
        10.531117 ],
       [ 4.793777 ,  5.1250052,  5.205284 , ...,  6.08579  ,  6.0924664,
         6.0957317],
       ...,
       [ 7.3680816,  7.4144983,  7.4873137, ...,  9.415309 ,  9.418577 ,
         9.421833 ],
       [ 8.759402 ,  8.7638645,  9.546374 , ..., 11.836472 , 11.852649 ,
        11.85586  ],
       [ 7.1296797,  7.2653217,  7.413504 , ..., 11.326677 , 11.330711 ,
        11.331037 ]], dtype=float32)


pg.louvain(data_trial)

2022-03-09 00:17:52,078 - pegasus.tools.graph_operations - INFO - Function 'construct_graph' finished in 1.01s.
2022-03-09 00:18:12,856 - pegasus.tools.clustering - INFO - Louvain clustering is done. Get 19 clusters.
2022-03-09 00:18:12,888 - pegasus.tools.clustering - INFO - Function 'louvain' finished in 21.82s.


data_trial.obs['louvain_labels'].value_counts()

1     5623
2     4320
3     3594
4     2889
5     2851
6     2660
7     2227
8     1927
9     1625
10    1432
11    1298
12    1032
13     938
14     897
15     550
16     430
17     407
18     386
19     379
Name: louvain_labels, dtype: int64


pg.compo_plot(data_trial, 'louvain_labels', 'Channel')


pg.tsne(data_trial)

Will use momentum during exaggeration phase
Computing input similarities...
Using perplexity, so normalizing input data (to prevent numerical problems)
Using perplexity, not the manually set kernel width.  K (number of nearest neighbors) and sigma (bandwidth) parameters are going to be ignored.
Using ANNOY for knn search, with parameters: n_trees 50 and search_k 4500
Going to allocate memory. N: 35465, K: 90, N*K = 3191850
Building Annoy tree...
Done building tree. Beginning nearest neighbor search... 
parallel (6 threads):
2022-03-09 00:18:42,467 - pegasus.tools.visualization - INFO - Function 'tsne' finished in 28.84s.
[===========================================================>] 99% 2.546s====================>                               ] 46% 1.159s


pg.scatter(data_trial, attrs=['louvain_labels', 'Channel'], basis='tsne')


pg.highly_variable_features(data, batch='Channel') 
pg.pca(data)
pca_key = pg.run_harmony(data)

2022-03-09 00:18:44,072 - pegasus.tools.hvf_selection - INFO - Function 'estimate_feature_statistics' finished in 0.34s.
2022-03-09 00:18:44,108 - pegasus.tools.hvf_selection - INFO - 2000 highly variable features have been selected.
2022-03-09 00:18:44,109 - pegasus.tools.hvf_selection - INFO - Function 'highly_variable_features' finished in 0.38s.
2022-03-09 00:18:49,597 - pegasus.tools.preprocessing - INFO - Function 'pca' finished in 5.49s.
2022-03-09 00:18:49,971 - pegasus.tools.batch_correction - INFO - Start integration using Harmony.
	Initialization is completed.
	Completed 1 / 10 iteration(s).
	Completed 2 / 10 iteration(s).
	Completed 3 / 10 iteration(s).
	Completed 4 / 10 iteration(s).
	Completed 5 / 10 iteration(s).
	Completed 6 / 10 iteration(s).
	Completed 7 / 10 iteration(s).
	Completed 8 / 10 iteration(s).
Reach convergence after 8 iteration(s).
2022-03-09 00:19:10,092 - pegasus.tools.batch_correction - INFO - Function 'run_harmony' finished in 20.49s.


data.obsm['X_pca_harmony'].shape

(35465, 50)


pg.neighbors(data, rep=pca_key)
pg.louvain(data, rep=pca_key)

2022-03-09 00:19:14,962 - pegasus.tools.nearest_neighbors - INFO - Function 'get_neighbors' finished in 4.86s.
2022-03-09 00:19:15,987 - pegasus.tools.nearest_neighbors - INFO - Function 'calculate_affinity_matrix' finished in 1.03s.
2022-03-09 00:19:17,115 - pegasus.tools.graph_operations - INFO - Function 'construct_graph' finished in 1.13s.
2022-03-09 00:19:35,274 - pegasus.tools.clustering - INFO - Louvain clustering is done. Get 16 clusters.
2022-03-09 00:19:35,310 - pegasus.tools.clustering - INFO - Function 'louvain' finished in 19.32s.


pg.compo_plot(data, 'louvain_labels', 'Channel')


pg.tsne(data, rep=pca_key)

Symmetrizing...
Using the given initialization.
Exaggerating Ps by 12.000000
Input similarities computed (sparsity = 0.003936)!
Learning embedding...
Using FIt-SNE approximation.
Iteration 50 (50 iterations in 0.69 seconds), cost 5.635483
Iteration 100 (50 iterations in 0.68 seconds), cost 5.171618
Iteration 150 (50 iterations in 0.68 seconds), cost 5.038671
Iteration 200 (50 iterations in 0.66 seconds), cost 4.991902
Iteration 250 (50 iterations in 0.65 seconds), cost 4.964881
Unexaggerating Ps by 12.000000
Iteration 300 (50 iterations in 0.66 seconds), cost 3.761440
Iteration 350 (50 iterations in 0.69 seconds), cost 3.349017
Iteration 400 (50 iterations in 0.87 seconds), cost 3.144547
Iteration 450 (50 iterations in 1.15 seconds), cost 3.021911
Iteration 500 (50 iterations in 1.71 seconds), cost 2.937259
Iteration 550 (50 iterations in 1.85 seconds), cost 2.875143
Iteration 600 (50 iterations in 2.43 seconds), cost 2.828351
Iteration 650 (50 iterations in 2.94 seconds), cost 2.79162022-03-09 00:20:03,834 - pegasus.tools.visualization - INFO - Function 'tsne' finished in 27.84s.
62
Iteration 700 (50 iterations in 3.61 seconds), cost 2.764021
Iteration 750 (50 iterations in 4.44 seconds), cost 2.744197
Will use momentum during exaggeration phase
Computing input similarities...
Using perplexity, so normalizing input data (to prevent numerical problems)
Using perplexity, not the manually set kernel width.  K (number of nearest neighbors) and sigma (bandwidth) parameters are going to be ignored.
Using ANNOY for knn search, with parameters: n_trees 50 and search_k 4500
Going to allocate memory. N: 35465, K: 90, N*K = 3191850
Building Annoy tree...
Done building tree. Beginning nearest neighbor search... 
parallel (6 threads):
[====>


pg.scatter(data, attrs=['louvain_labels', 'Channel'], basis='tsne')

[===========================================================>] 99% 2.781s


pg.umap(data, rep=pca_key)

2022-03-09 00:20:05,069 - pegasus.tools.nearest_neighbors - INFO - Found cached kNN results, no calculation is required.
2022-03-09 00:20:05,069 - pegasus.tools.nearest_neighbors - INFO - Function 'get_neighbors' finished in 0.00s.
UMAP(min_dist=0.5, precomputed_knn=(array([[    0, 32454, 33203, ..., 34421, 4585, 33415],
       [    1, 7658, 32973, ..., 33173, 21503, 20769],
       [    2, 13228, 20787, ..., 27422, 28285, 34507],
       ...,
       [35462, 31460, 28289, ..., 11139, 21229, 20250],
       [35463, 34709, 8206, ..., 26634, 1453, 17529],
       [35464, 35401, 8702, ..., 13853, 32612, 24418]]), array([[ 0.       , 5.1344213, 5.189244 , ..., 5.800479 , 5.8348527,
         5.8351173],
       [ 0.       , 7.705064 , 8.047326 , ..., 8.791692 , 8.835943 ,
         8.856787 ],
       [ 0.       , 4.039253 , 4.5694947, ..., 5.1763344, 5.1773973,
         5.2111516],
       ...,
       [ 0.       , 6.6650147, 6.7569413, ..., 7.6235228, 7.67277  ,
         7.6997523],
       [ 0.       , 8.145102 , 9.145088 , ..., 10.019776 , 10.028577 ,
        10.075572 ],
       [ 0.       , 5.8484697, 5.9088964, ..., 7.372049 , 7.3747334,
         7.489392 ]], dtype=float32), <pegasus.tools.visualization.DummyNNDescent object at 0x7fa6626a9dd0>), random_state=0, verbose=True)
Wed Mar  9 00:20:05 2022 Construct fuzzy simplicial set
Wed Mar  9 00:20:07 2022 Construct embedding

Wed Mar  9 00:20:29 2022 Finished embedding
2022-03-09 00:20:29,633 - pegasus.tools.visualization - INFO - Function 'umap' finished in 24.56s.


pg.scatter(data, attrs=['louvain_labels', 'Channel'], basis='umap')


pg.de_analysis(data, cluster='louvain_labels')

2022-03-09 00:20:31,688 - pegasus.tools.diff_expr - INFO - CSR matrix is converted to CSC matrix. Time spent = 0.9306s.
2022-03-09 00:20:50,533 - pegasus.tools.diff_expr - INFO - MWU test and AUROC calculation are finished. Time spent = 18.8446s.
2022-03-09 00:20:50,635 - pegasus.tools.diff_expr - INFO - Sufficient statistics are collected. Time spent = 0.1018s.
2022-03-09 00:20:50,733 - pegasus.tools.diff_expr - INFO - Differential expression analysis is finished.
2022-03-09 00:20:50,734 - pegasus.tools.diff_expr - INFO - Function 'de_analysis' finished in 19.98s.


marker_dict = pg.markers(data)


marker_dict['1']['up'].sort_values(by='log2FC', ascending=False)


pg.volcano(data, cluster_id = '1', dpi=200)


pg.write_results_to_excel(marker_dict, "MantonBM_subset.de.xlsx")

2022-03-09 00:21:26,714 - pegasus.tools.diff_expr - INFO - Excel spreadsheet is written.
2022-03-09 00:21:26,846 - pegasus.tools.diff_expr - INFO - Function 'write_results_to_excel' finished in 16.34s.


celltype_dict = pg.infer_cell_types(data, markers = 'human_immune')
cluster_names = pg.infer_cluster_names(celltype_dict)


celltype_dict['1']

[name: T cell; score: 1.00; average marker percentage: 65.21%; strong support: (CD3D+,75.12%),(CD3E+,69.03%),(CD3G+,43.72%),(TRAC+,72.99%)]


pg.annotate(data, name='anno', based_on='louvain_labels', anno_dict=cluster_names)
data.obs['anno'].value_counts()

Naive T cell                   6290
CD14+ Monocyte                 5177
B cell                         4332
T helper cell                  3272
Cytotoxic T cell               2953
Natural killer cell            2600
Cytotoxic T cell-2             2596
Cytotoxic T cell-3             1739
Erythroid cells                1629
Hematopoietic stem cell        1441
Pre B cell                      933
CD14+ Monocyte-2                685
CD1C+ dendritic cell            550
CD16+ Monocyte                  469
Plasma cell                     408
Plasmacytoid dendritic cell     391
Name: anno, dtype: int64


pg.scatter(data, attrs='anno', basis='tsne', dpi=100)


pg.scatter(data, attrs='anno', basis='umap', legend_loc='on data', dpi=150)


data

MultimodalData object with 1 UnimodalData: 'GRCh38-rna'
    It currently binds to UnimodalData object GRCh38-rna

UnimodalData object with n_obs x n_vars = 35465 x 25653
    Genome: GRCh38; Modality: rna
    It contains 2 matrices: 'X', 'raw.X'
    It currently binds to matrix 'X' as X

    obs: 'n_genes', 'Channel', 'gender', 'n_counts', 'percent_mito', 'scale', 'louvain_labels'(cluster), 'anno'
    var: 'featureid', 'n_cells', 'percent_cells', 'robust', 'highly_variable_features', 'mean', 'var', 'hvf_loess', 'hvf_rank'
    obsm: 'X_pca', 'X_pca_harmony', 'pca_harmony_knn_indices'(knn), 'pca_harmony_knn_distances'(knn), 'X_tsne'(basis), 'X_umap'(basis)
    varm: 'means', 'partial_sum', 'de_res'
    obsp: 'W_pca_harmony'
    uns: 'genome', 'modality', 'df_qcplot', 'norm_count', 'ncells', 'stdzn_mean', 'stdzn_std', 'stdzn_max_value', '_tmp_fmat_highly_variable_features', 'PCs', 'pca', 'pca_features', '_attr2type', 'louvain_resolution'


data.select_matrix('raw.X')
data

MultimodalData object with 1 UnimodalData: 'GRCh38-rna'
    It currently binds to UnimodalData object GRCh38-rna

UnimodalData object with n_obs x n_vars = 35465 x 25653
    Genome: GRCh38; Modality: rna
    It contains 2 matrices: 'X', 'raw.X'
    It currently binds to matrix 'raw.X' as X

    obs: 'n_genes', 'Channel', 'gender', 'n_counts', 'percent_mito', 'scale', 'louvain_labels'(cluster), 'anno'
    var: 'featureid', 'n_cells', 'percent_cells', 'robust', 'highly_variable_features', 'mean', 'var', 'hvf_loess', 'hvf_rank'
    obsm: 'X_pca', 'X_pca_harmony', 'pca_harmony_knn_indices'(knn), 'pca_harmony_knn_distances'(knn), 'X_tsne'(basis), 'X_umap'(basis)
    varm: 'means', 'partial_sum', 'de_res'
    obsp: 'W_pca_harmony'
    uns: 'genome', 'modality', 'df_qcplot', 'norm_count', 'ncells', 'stdzn_mean', 'stdzn_std', 'stdzn_max_value', '_tmp_fmat_highly_variable_features', 'PCs', 'pca', 'pca_features', '_attr2type', 'louvain_resolution'


data.select_matrix('X')


pg.diffmap(data, rep=pca_key)

2022-03-09 00:21:27,656 - pegasus.tools.diffusion_map - INFO - Calculating connected components is done.
2022-03-09 00:21:27,853 - pegasus.tools.diffusion_map - INFO - Calculating normalized affinity matrix is done.
2022-03-09 00:21:37,741 - pegasus.tools.diffusion_map - INFO - Detected knee point at t = 196.
2022-03-09 00:21:37,791 - pegasus.tools.diffusion_map - INFO - Function 'diffmap' finished in 10.19s.


data.obsm['X_diffmap'].shape

(35465, 99)


pg.fle(data)

2022-03-09 00:21:41,963 - pegasus.tools.nearest_neighbors - INFO - Function 'get_neighbors' finished in 4.16s.
2022-03-09 00:21:42,683 - pegasus.tools.nearest_neighbors - INFO - Function 'calculate_affinity_matrix' finished in 0.72s.
2022-03-09 00:21:43,119 - pegasus.tools.graph_operations - INFO - Function 'construct_graph' finished in 0.44s.

Mar 09, 2022 12:21:44 AM org.netbeans.modules.masterfs.watcher.Watcher getNotifierForPlatform
INFO: Native file watcher is disabled
Mar 09, 2022 12:21:47 AM org.gephi.io.processor.plugin.DefaultProcessor process
INFO: # Nodes loaded: 35,465
Mar 09, 2022 12:21:47 AM org.gephi.io.processor.plugin.DefaultProcessor process
INFO: # Edges loaded: 1,103,395

100 iterations, change_per_node = 314.55091468548574
200 iterations, change_per_node = 81.4078060528153
300 iterations, change_per_node = 78.68563696426409
400 iterations, change_per_node = 50.381170670906506
500 iterations, change_per_node = 81.88712085418351
600 iterations, change_per_node = 71.7350053093973
700 iterations, change_per_node = 48.22426559362979
800 iterations, change_per_node = 32.936009653696345
900 iterations, change_per_node = 30.228989141851898
1000 iterations, change_per_node = 36.02514597031775
1100 iterations, change_per_node = 32.50508532893281
1200 iterations, change_per_node = 29.160303770087843
1300 iterations, change_per_node = 23.066557518704986
1400 iterations, change_per_node = 20.305789518290034
1500 iterations, change_per_node = 15.440457159618003
1600 iterations, change_per_node = 12.206252136992426
1700 iterations, change_per_node = 13.66054991449248
1800 iterations, change_per_node = 10.094766758544216
1900 iterations, change_per_node = 9.99155675961005
2000 iterations, change_per_node = 12.510247404737516
2100 iterations, change_per_node = 8.93481718667209
2200 iterations, change_per_node = 8.972110610637227
2300 iterations, change_per_node = 11.323951503164169
2400 iterations, change_per_node = 8.525012578530017
2500 iterations, change_per_node = 7.59856696956334
2600 iterations, change_per_node = 5.334817110153111
2700 iterations, change_per_node = 8.873018231546418
2800 iterations, change_per_node = 7.916615856772782
2900 iterations, change_per_node = 6.510263623848311
3000 iterations, change_per_node = 6.0567525331083445
3100 iterations, change_per_node = 8.2483061235931
calc_mwu finished for genes in [8552, 12828).
calc_mwu finished for genes in [17103, 21378).
calc_mwu finished for genes in [0, 4276).
calc_mwu finished for genes in [4276, 8552).
calc_mwu finished for genes in [12828, 17103).
calc_mwu finished for genes in [21378, 25653).
3200 iterations, change_per_node = 4.443139358840217
3300 iterations, change_per_node = 4.180926920956974
3400 iterations, change_per_node = 6.110506508332063
3500 iterations, change_per_node = 6.929826915722645
3600 iterations, change_per_node = 2.992907053411964
3700 iterations, change_per_node = 3.894387445895195
3800 iterations, change_per_node = 6.215661979170708
3900 iterations, change_per_node = 4.829677652345176
4000 iterations, change_per_node = 3.8065980221786
4100 iterations, change_per_node = 3.0219000822165283
4200 iterations, change_per_node = 2.548266521157155
4300 iterations, change_per_node = 2.4068939522729846
4400 iterations, change_per_node = 4.825459054958883
4500 iterations, change_per_node = 4.343794578080158
4600 iterations, change_per_node = 3.4600645196071476
Finished in 4605 iterations, change_per_node = 1.9571089355465505
Time = 363.337s
2022-03-09 00:27:47,776 - pegasus.tools.visualization - INFO - Function 'fle' finished in 369.98s.


pg.scatter(data, attrs='anno', basis='fle')


pg.write_output(data, "MantonBM_result.zarr.zip")

2022-03-09 00:27:48,454 - pegasusio.zarr_utils - WARNING - Detected and removed pre-existing file MantonBM_result.zarr.zip.
2022-03-09 00:27:49,447 - pegasusio.readwrite - INFO - zarr.zip file 'MantonBM_result.zarr.zip' is written.
2022-03-09 00:27:49,448 - pegasusio.readwrite - INFO - Function 'write_output' finished in 1.00s.

	featureid
featurekey
MIR1302-2HG	ENSG00000243485
FAM138A	ENSG00000237613
OR4F5	ENSG00000186092
AL627309.1	ENSG00000238009
AL627309.3	ENSG00000239945

	kept	median_n_genes	median_n_umis	median_percent_mito	filt	total	median_n_genes_before	median_n_umis_before	median_percent_mito_before
Channel
MantonBM5_HiSeq_1	4090	770.0	2795.0	3.136190	1873	5963	650.0	2139.0	3.399615
MantonBM4_HiSeq_1	4172	790.0	2278.5	3.271181	1896	6068	672.0	1764.0	3.519009
MantonBM3_HiSeq_1	4225	779.0	2621.0	3.274451	1331	5556	715.0	2229.0	3.449398
MantonBM1_HiSeq_1	4415	790.0	2533.0	3.713331	1422	5837	723.0	2149.0	3.855422
MantonBM7_HiSeq_1	4452	745.0	2403.5	3.053718	1573	6025	679.0	2053.0	3.177570
MantonBM8_HiSeq_1	4511	735.0	2561.0	3.520510	1581	6092	671.5	2212.0	3.706849
MantonBM6_HiSeq_1	4665	852.0	2700.0	3.032258	2083	6748	741.0	2129.0	3.345829
MantonBM2_HiSeq_1	4935	801.0	2486.0	3.514056	995	5930	756.0	2261.5	3.534756

	featureid	n_cells	percent_cells	robust	highly_variable_features	mean	var	hvf_loess	hvf_rank
featurekey
LYZ	ENSG00000090382	8566	24.153391	True	True	1.526394	8.110589	3.775874	3
S100A9	ENSG00000163220	8182	23.070633	True	True	1.423049	7.649132	3.657402	5
S100A8	ENSG00000143546	7674	21.638235	True	True	1.328664	7.228290	3.463029	7
HLA-DRA	ENSG00000204287	14836	41.832793	True	True	2.242150	7.513039	4.208062	12
GNLY	ENSG00000115523	5196	14.651064	True	True	0.882395	4.859677	2.504143	13
...	...	...	...	...	...	...	...	...	...
TPK1	ENSG00000196511	917	2.585648	True	True	0.085252	0.289441	0.268601	5554
AL355916.1	ENSG00000232774	99	0.279148	True	True	0.010097	0.037130	0.032563	5557
MEI1	ENSG00000167077	1813	5.112082	True	True	0.178823	0.611417	0.574647	5559
AL035701.1	ENSG00000231769	222	0.625969	True	True	0.021673	0.078620	0.070820	5563
KIAA1324	ENSG00000116299	155	0.437051	True	True	0.015310	0.054929	0.048948	5564

	log2Mean	log2Mean_other	log2FC	percentage	percentage_other	percentage_fold_change	auroc	mwu_U	mwu_pval	mwu_qval
featurekey
LTB	6.078218	2.750911	3.327308	88.982513	41.377892	2.150485e+00	0.752276	138050716.5	0.000000	0.000000
TRAC	4.584767	1.860424	2.724343	72.988869	29.610968	2.464927e+00	0.715238	131253949.5	0.000000	0.000000
BCL11B	3.803313	1.143754	2.659559	64.610497	20.260496	3.188989e+00	0.731830	134298667.0	0.000000	0.000000
CD3D	4.584280	2.014685	2.569595	75.119240	32.243359	2.329759e+00	0.701153	128669176.5	0.000000	0.000000
CD3E	4.084703	1.814133	2.270570	69.030205	30.365038	2.273345e+00	0.686962	126064947.0	0.000000	0.000000
...	...	...	...	...	...	...	...	...	...	...
AL513008.1	0.001603	0.000000	0.001603	0.031797	0.000000	1.000000e+30	0.500159	91784550.0	0.002321	0.005037
AC141002.1	0.001573	0.000000	0.001573	0.031797	0.000000	1.000000e+30	0.500159	91784550.0	0.002321	0.005037
AP001972.3	0.001547	0.000000	0.001547	0.031797	0.000000	1.000000e+30	0.500159	91784550.0	0.002321	0.005037
AC093797.1	0.001542	0.000000	0.001542	0.031797	0.000000	1.000000e+30	0.500159	91784550.0	0.002321	0.005037
AC023051.1	0.001522	0.000000	0.001522	0.031797	0.000000	1.000000e+30	0.500159	91784550.0	0.002321	0.005037

Pegasus Tutorial¶

Count Matrix File¶

Preprocessing¶

Filtration¶

Normalization and Logarithmic Transformation¶

Highly Variable Gene Selection¶

Principal Component Analysis¶

Nearest Neighbors¶

Clustering and Visualization¶

Batch Correction¶

Repeat Previous Steps on the Corrected Data¶

Visualization¶

tSNE Plot¶

UMAP Plot¶

Differential Expression Analysis¶

Cell Type Annotation¶

Raw Count vs Log-norm Count¶

Cell Development Trajectory and Diffusion Map¶

Save Result to File¶

Read More...¶

	n_genes	Channel	gender
barcodekey
MantonBM1_HiSeq_1-AAACCTGAGCAGGTCA	816	MantonBM1_HiSeq_1	female
MantonBM1_HiSeq_1-AAACCTGCACACTGCG	716	MantonBM1_HiSeq_1	female
MantonBM1_HiSeq_1-AAACCTGCACCGGAAA	554	MantonBM1_HiSeq_1	female
MantonBM1_HiSeq_1-AAACCTGCATAGACTC	967	MantonBM1_HiSeq_1	female
MantonBM1_HiSeq_1-AAACCTGCATCGATGT	1704	MantonBM1_HiSeq_1	female