import pegasus as pg

data = pg.read_input("nestorawa_forcellcycle_expressionMatrix.txt")
data

2022-03-09 00:09:42,432 - pegasusio.readwrite - INFO - tsv file 'nestorawa_forcellcycle_expressionMatrix.txt' is loaded.
2022-03-09 00:09:42,433 - pegasusio.readwrite - INFO - Function 'read_input' finished in 2.59s.

MultimodalData object with 1 UnimodalData: 'unknown-rna'
    It currently binds to UnimodalData object unknown-rna

UnimodalData object with n_obs x n_vars = 773 x 24193
    Genome: unknown; Modality: rna
    It contains 1 matrix: 'X'
    It currently binds to matrix 'X' as X

    uns: 'genome', 'modality'


pg.qc_metrics(data, min_genes=0, max_genes=1e5)
pg.filter_data(data)
pg.identify_robust_genes(data)
pg.log_norm(data, norm_count=1e4)

2022-03-09 00:09:42,563 - pegasusio.qc_utils - INFO - After filtration, 773 out of 773 cell barcodes are kept in UnimodalData object unknown-rna.
2022-03-09 00:09:42,564 - pegasus.tools.preprocessing - INFO - Function 'filter_data' finished in 0.08s.
2022-03-09 00:09:42,716 - pegasus.tools.preprocessing - INFO - After filtration, 24158/24193 genes are kept. Among 24158 genes, 24158 genes are robust.
2022-03-09 00:09:42,717 - pegasus.tools.preprocessing - INFO - Function 'identify_robust_genes' finished in 0.15s.
2022-03-09 00:09:42,827 - pegasus.tools.preprocessing - INFO - Function 'log_norm' finished in 0.11s.


pg.calc_signature_score(data, 'cell_cycle_human')

2022-03-09 00:09:42,864 - pegasus.tools.utils - INFO - Loaded signatures from GMT file /Users/yangy197/GitHub/pegasus/pegasus/data_files/cell_cycle_human.gmt.
2022-03-09 00:09:42,867 - pegasus.tools.signature_score - INFO - Signature G1/S: 42 out of 43 genes are used in signature score calculation.
2022-03-09 00:09:42,889 - pegasus.tools.signature_score - INFO - Signature G2/M: 52 out of 54 genes are used in signature score calculation.
2022-03-09 00:09:42,938 - pegasus.tools.signature_score - INFO - Function 'calc_signature_score' finished in 0.11s.


cell_cycle_genes = []
with open("cell_cycle_human.gmt", 'r') as f:
    for line in f:
        cell_cycle_genes += line.strip().split('\t')[2:]


data.obs['predicted_phase'].value_counts()

G0      346
G1/S    248
G2/M    179
Name: predicted_phase, dtype: int64


data_cc_genes = data[:, cell_cycle_genes].copy()
pg.pca(data_cc_genes)
data.obsm['X_pca'] = data_cc_genes.obsm['X_pca']

2022-03-09 00:09:43,115 - pegasus.tools.preprocessing - INFO - Function 'pca' finished in 0.08s.


pg.scatter(data, attrs='predicted_phase', basis='pca', dpi=130)


pca_key = pg.regress_out(data, attrs=['G1/S', 'G2/M'])

2022-03-09 00:09:43,386 - pegasus.tools.preprocessing - INFO - Function 'regress_out' finished in 0.10s.


pg.scatter(data, attrs=['predicted_phase'], basis=pca_key, dpi=130)


data_alt = data.copy()

data_alt.obs['CC_diff'] = data_alt.obs['G1/S'] - data_alt.obs['G2/M']
pca_key = pg.regress_out(data_alt, attrs=['CC_diff'])
pg.scatter(data_alt, attrs=['predicted_phase'], basis=pca_key, dpi=130)

2022-03-09 00:09:43,655 - pegasus.tools.preprocessing - INFO - Function 'regress_out' finished in 0.09s.


import numpy as np
from sklearn.decomposition import PCA

X = data.obsm['X_' + pca_key]
pca = PCA(n_components=X.shape[1], random_state=0, svd_solver='full')
X_pca_new = pca.fit_transform(X)
data.obsm['X_pca_new'] = np.ascontiguousarray(X_pca_new)

pg.scatter(data, attrs=['predicted_phase'], basis='pca_new', dpi=130)


X = data_alt.obsm['X_' + pca_key]
pca = PCA(n_components=X.shape[1], random_state=0, svd_solver='full')
X_pca_new = pca.fit_transform(X)
data_alt.obsm['X_pca_new'] = np.ascontiguousarray(X_pca_new)

pg.scatter(data_alt, attrs=['predicted_phase'], basis='pca_new', dpi=130)

Regress Out Tutorial¶

Cell-Cycle Scores¶

Cell Cycle Effects¶

Regress Out Cell Cycle Effects¶

Alternate Workflow¶

Summary¶

Matching with Seurat and SCANPY Tutorials¶