import pegasus as pg
import pandas as pd


data = pg.read_input('MantonBM_nonmix_subset.zarr.zip')

2022-04-10 21:52:30,898 - pegasusio.readwrite - INFO - zarr file 'MantonBM_nonmix_subset.zarr.zip' is loaded.
2022-04-10 21:52:30,899 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.38s.


pseudo = pg.pseudobulk(data, 'Channel', 'gender')
pseudo

2022-04-10 21:52:31,157 - pegasus.tools.pseudobulk - INFO - Function 'pseudobulk' finished in 0.25s.

UnimodalData object with n_obs x n_vars = 8 x 36601
    Genome: Channel; Modality: pseudobulk
    It contains 1 matrix: 'X'
    It currently binds to matrix 'X' as X

    obs: 'gender'
    var: 'featureid'
    uns: 'genome', 'modality'


pg.deseq2(pseudo, '~gender', ('gender', 'female', 'male'))

R[write to console]: estimating size factors

R[write to console]: estimating dispersions

R[write to console]: gene-wise dispersion estimates

R[write to console]: mean-dispersion relationship

R[write to console]: final dispersion estimates

R[write to console]: fitting model and testing

2022-04-10 21:52:47,487 - pegasus.tools.pseudobulk - INFO - Function 'deseq2' finished in 16.32s.


pd.DataFrame(pseudo.varm["deseq2"], index=pseudo.var_names)


markers = pg.pseudo.markers(pseudo)
print(markers)

{'up':                baseMean  log2FoldChange     lfcSE       stat        pvalue  \
featurekey                                                                   
XIST        1910.572444       13.615371  1.043313  13.050131  6.343849e-39   
PPP1R2C       14.765863        7.317893  1.165723   6.277559  3.439292e-10   
LINC00221      9.121555        5.188543  1.230549   4.216444  2.481844e-05   
TSIX          13.654597        3.889226  0.725102   5.363692  8.153793e-08   
KANTR         88.919116        1.248766  0.289454   4.314216  1.601703e-05   
BX890604.2   297.379563        1.233275  0.260909   4.726846  2.280338e-06   
JPX         1129.585486        0.888090  0.198601   4.471729  7.758989e-06   

                    padj  
featurekey                
XIST        5.397347e-35  
PPP1R2C     3.657687e-07  
LINC00221   1.759627e-02  
TSIX        7.708052e-05  
KANTR       1.184982e-02  
BX890604.2  1.940112e-03  
JPX         6.001225e-03  , 'down':                baseMean  log2FoldChange     lfcSE       stat        pvalue  \
featurekey                                                                   
RPS4Y1      6547.060706      -16.138498  1.050588 -15.361401  2.970823e-53   
EIF1AY      1316.973725      -13.825299  1.087960 -12.707548  5.369171e-37   
DDX3Y        451.442937      -12.280059  1.083214 -11.336691  8.634834e-30   
UTY          195.114474      -11.068404  1.074641 -10.299634  7.073023e-25   
PRKY         152.182688      -10.709941  1.084543  -9.875073  5.339433e-23   
KDM5D        132.117709      -10.504537  1.069489  -9.822019  9.051192e-23   
USP9Y        208.307283      -10.440421  1.086045  -9.613253  7.029269e-22   
ZFY           87.153393       -9.905218  1.071056  -9.248081  2.285583e-20   
AC244213.1    67.131335       -9.527208  1.097804  -8.678421  4.013006e-18   
TTTY14        61.185802       -9.402545  1.084608  -8.669075  4.356434e-18   
TMSB4Y        39.206868       -8.755393  1.075498  -8.140781  3.927369e-16   
LINC00278     27.826253       -8.258319  1.099821  -7.508787  5.967764e-14   
AC006157.1    19.899699       -7.768185  1.090473  -7.123686  1.050784e-12   
TTTY10        12.079865       -6.316182  1.116495  -5.657152  1.539057e-08   
AC010889.1     5.644311       -5.964417  1.227109  -4.860542  1.170648e-06   
SCGB3A1      110.430089       -3.424882  0.544527  -6.289648  3.181862e-10   
TPSB2         63.396259       -2.604417  0.560631  -4.645512  3.392345e-06   

                    padj  
featurekey                
RPS4Y1      5.055152e-49  
EIF1AY      3.045394e-33  
DDX3Y       3.673258e-26  
UTY         2.407091e-21  
PRKY        1.514263e-19  
KDM5D       2.200215e-19  
USP9Y       1.495125e-18  
ZFY         4.321275e-17  
AC244213.1  6.739007e-15  
TTTY14      6.739007e-15  
TMSB4Y      5.569009e-13  
LINC00278   7.811343e-11  
AC006157.1  1.277153e-09  
TTTY10      1.540506e-05  
AC010889.1  1.048408e-03  
SCGB3A1     3.609505e-07  
TPSB2       2.748769e-03  }


pg.pseudo.write_results_to_excel(markers, 'test.de.xlsx')

2022-04-10 21:52:47,568 - pegasus.pseudo.convenient - INFO - Excel spreadsheet is written.
2022-04-10 21:52:47,569 - pegasus.pseudo.convenient - INFO - Function 'write_results_to_excel' finished in 0.04s.


pg.pseudo.volcano(pseudo)


pg.fgsea(pseudo, 'log2FoldChange', 'canonical_pathways', 'deseq2', fgsea_key = 'fgsea_deseq2')

2022-04-10 21:52:49,623 - pegasus.tools.utils - INFO - Loaded signatures from GMT file /home/yiming/GitHub/pegasus/pegasus/data_files/c2.cp.v7.5.1.symbols.gmt.
2022-04-10 21:52:53,051 - pegasus.tools.fgsea - INFO - Function 'fgsea' finished in 3.63s.


pg.plot_gsea(pseudo, 'fgsea_deseq2')

	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
featurekey
MIR1302-2HG	0.113807	0.696538	3.533812	0.197107	0.843744	1.00000
FAM138A	0.000000	0.000000	0.000000	0.000000	1.000000	1.00000
OR4F5	0.000000	0.000000	0.000000	0.000000	1.000000	1.00000
AL627309.1	3.131268	0.239777	0.913821	0.262389	0.793022	0.99985
AL627309.3	0.000000	0.000000	0.000000	0.000000	1.000000	1.00000
...	...	...	...	...	...	...
AC141272.1	0.113807	0.696538	3.533812	0.197107	0.843744	1.00000
AC023491.2	0.000000	0.000000	0.000000	0.000000	1.000000	1.00000
AC007325.1	0.000000	0.000000	0.000000	0.000000	1.000000	1.00000
AC007325.4	68.286678	-0.911313	0.724284	-1.258226	0.208310	0.99985
AC007325.2	0.121696	-0.746135	3.533812	-0.211142	0.832777	1.00000

Pegasus Pseudobulk Analysis¶

Dataset¶

Generate pseudobulk matrix¶

Differential Expression (DE) Analysis on Pseudobulk Matrix¶

Get significant DE genes in human-readable format¶

Write DE results to spreadsheet¶

Generate volcano plot¶

Gene Set Enrichment Analysis (GSEA)¶

Generate GSEA plots¶