Installation and Usage — pySCENIC latest documentation
https://pyscenic.readthedocs.io/en/latest/installation.html
在SCENIC的R版本中目前不适用使用新版本的数据库,具体的issue见
feather v1 or v2 for R package
。因此为了使用R版本的issue使用的
旧版本的数据库
cisTarget databases - Feather v1 databases
特别注意:
If you are using pySCENIC < 0.12.0 and ctxcore < 0.2.0 you will need to retrieve the databases from the
old
folder, in feather v1 format.
However, we recommend to update to the latest versions to use feather v2 (smaller and easier to read) and use the most recent databases and annotations (v10nr_clust/mc_v10_clust).
There are two main types of databases:
-
Gene-based databases
are meant to be used with (py)SCENIC and for motif enrichment in gene sets with cisTarget.
-
Region-based databases
are meant to be used with SCENIC+ and for motif enrichment in region sets with cisTarget.
step1 安装
conda create -n pyscenic python=3.9
conda activate pyscenic
#安装依赖包
conda install -y numpy
conda install -y -c anaconda cytoolz
conda install -y scanpy
#安装pyscenic
pip install pyscenic -i http://pypi.douban.com/simple/
step 2TF注释 Auxiliary datasets
https://resources.aertslab.org/cistarget/databases/
https://resources.aertslab.org/cistarget/databases/
根据物种选择数据库
1.feather格式的ranking
排名数据库
2.基序==》转录因子
注释数据库
TSV text 文件格式 .tbl 浏览器
3.转录因子列表 浏览器复制 txt
To successfully use this pipeline you also need
auxilliary datasets
available at
cistargetDBs website
:
-
Databases ranking the whole genome
排名数据库
Databases ranking the whole genome
of your species of interest based on regulatory features (i.e. transcription factors) in
feather
format.
-
Motif to TF annotations
注释数据库
Motif to TF annotations
database providing the missing link between an enriched motif and the transcription factor that binds this motif. This pipeline needs a TSV text file where every line represents a particular annotation.
Caution
These ranking databases are 1.1 Gb each so downloading them might take a while. An annotations file is typically 100Mb in size.
-
A
list of transcription factors
is required for the network inference step (GENIE3/GRNBoost2).
mkdir all_tf_list && cd all_tf_list && wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_dmel.txt
2016 cd all_tf_list && wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_dmel.txt
2017 ls
2018 wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_hg38.txt
2019 wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_mm.txt
2 feather文件下载 排名数据库
Welcome to the cisTarget resources website!
-
To download a database for motif enrichment, go to
databases
.
-
To download a motif annotations, go to
motif2tf
.
-
To download our cluster-buster implementation, go to
programs
.
-
To download precomputed regions for creating gene-based databases, go to
regions
.
-
To download the lists of transcription factors (TFs) for human, mouse and fly, go to
tf_lists
.
-
To download chip-seq tracks annotations, go to
track2tf
.
We recommend using the most recent databases and annotations (v10nr_clust).
IMPORTANT:
The cisTarget database files are quite big (most of them 1-100GB). To avoid corrupt or incomplete downloads, files can be downloaded with zsync_curl (which is basically rsync over HTTP(S)). It allows resuming already partially downloaded databases and only will download missing or redownload corrupted chunks.
# Download (with wget or curl):
wget https://resources.aertslab.org/cistarget/zsync_curl
# curl -O https://resources.aertslab.org/cistarget/zsync_curl
# Make executable:
chmod a+x zsync_curl
# Display full path to zsync_curl.
ZSYNC_CURL="${PWD}/zsync_curl"
echo "${ZSYNC_CURL}"
# Compile zsync_curl from source
# Display path to zsync_curl:
ZSYNC_CURL='zsync_curl'
echo "${ZSYNC_CURL}"
Homo sapiens - hg38 - refseq_r80 - SCENIC+ databases - Gene based
Homo sapiens - hg38
Homo sapiens - hg38 - refseq_r80 - v9 databases - Gene based
# Specify database:
feather_database_url='https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather'
# Download database directly (with wget or curl):
nohup wget -c "${feather_database_url}" &
# curl -O "${feather_database_url}"
# Specify database:
feather_database_url='https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather'
feather_database_url='https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.genes_vs_motifs.rankings.feather'
# Download database directly (with wget or curl):
nohup wget -c "${feather_database_url}" &
# curl -O "${feather_database_url}"
但是现在更新了!!!Homo sapiens - hg38 - refseq_r80
推荐使用mc_v10_clust
mm10 Mus musculus - mm10 - refseq_r80 - SCENIC+ databases - Gene based
mkdir mm10 &&cd mm10
# Specify database:
feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc9nr/gene_based/mm10__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather'
# feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc9nr/gene_based/mm10__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.genes_vs_motifs.rankings.feather'
feather_database="${feather_database_url##*/}"
# Download database directly (with wget or curl):
wget "${feather_database_url}"
# curl -O "${feather_database_url}"
# Specify database:
feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'
# feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather'
#feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'
# feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather'
feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'
feather_database="${feather_database_url##*/}"
# Download database directly (with wget or curl):
wget "${feather_database_url}"
# curl -O "${feather_database_url}"
# Download sha256sum.txt (with wget or curl):
wget https://resources.aertslab.org/cistarget/databases/sha256sum.txt
# curl -O https://resources.aertslab.org/cistarget/databases/sha256sum.txt
# Check if sha256 checksum matches for the downloaded database:
awk -v feather_database=${feather_database} '$2 == feather_database' sha256sum.txt | sha256sum -c -
# If you downloaded mulitple databases, you can check them all at onces with:
sha256sum -c sha256sum.txt
#############最新feather数据库 v10
Homo sapiens - hg38 - refseq_r80 - SCENIC+ databases - Gene basedhttps://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/
https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather
Mus musculus - mm10 - refseq_r80 - SCENIC+ databases - Gene basedhttps://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/ https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather
3注释数据库 tbl
Motif2TF annotationshttps://resources.aertslab.org/cistarget/motif2tf/
Motif2TF annotations
We provide motif annotations for the following species:
- Human (hgnc)
- Mouse (mgi)
- Fly (flybase)
- Chicken
使用条件范围
For each specie, we provide annotations depending on the motif collection used:
- v8 (only Drosophila): Annotations based on the 2016 cisTarget motif collection. Use these files if you are using the mc8nr databases (only available for Drosophila).
- v9: Annotations based on the 2017 cisTarget motif collection. Use these files if you are using the mc9nr databases.
- v10: Annotations based on the 2022 SCENIC+ motif collection. Use these files if you are using the mc_v10_clust databases.
三个数据库
使用pyscenic做转录因子分析虽然有转录因子的缺失,但是转录组因子的规律并没有变化,在iCAF和mCAF这个亚群特异性激活的转录因子保持原文的样子。https://mp.weixin.qq.com/s/ncSW8EXrpzqD-3b7uXy5Mg
提取单细胞表达量矩阵csv忘记并且导入Linux服务器
首先我们对文章《Single-cell RNA sequencing highlights the role of inflammatory cancer-associated fibroblasts in bladder urothelial carcinoma》的单细胞转录组数据进行降维聚类分群,然后提取fibo这个子亚群,然后再随机挑取1000个fibo细胞,这样的表达量矩阵进行后续分析。
在seurat里面将矩阵筛选,然后输出成csv,再用python读入,然后打包成 loom
#注意矩阵一定要转置,不然会报错
write.csv(t(as.matrix(fibo@assays$RNA@counts)),file = "fibo_1000.csv")
Singularity/Apptainer
Singularity/Apptainer images can be build from the Docker Hub image as source:
# pySCENIC CLI version.
singularity build aertslab-pyscenic-0.12.1.sif docker://aertslab/pyscenic:0.12.1
apptainer build aertslab-pyscenic-0.12.1.sif docker://aertslab/pyscenic:0.12.1
# pySCENIC CLI version + ipython kernel + scanpy.
singularity build aertslab-pyscenic-scanpy-0.12.1-1.9.1.sif docker://aertslab/pyscenic_scanpy:0.12.1_1.9.1
apptainer build aertslab-pyscenic-0.12.1-1.9.1.sif docker://aertslab/pyscenic_scanpy:0.12.1_1.9.1
To run the aertslab-pyscenic-0.12.1.sif
Singularity container with the pyscenic grn
command, you can use the following command:
bashCopy code
singularity run aertslab-pyscenic-0.12.1.sif \
pyscenic grn \
-B /data:/data \
--num_workers 6 \
-o /data/expr_mat.adjacencies.tsv \
/data/expr_mat.tsv \
/data/allTFs_hg38.txt
This command assumes that you have the aertslab-pyscenic-0.12.1.sif
Singularity container file in your current working directory. If the container file is located elsewhere, you need to provide the full path to the container file in the singularity run
command.
The command mounts the /data
directory inside the container to the local /data
directory on your system using the -B
option. This allows accessing files and directories under /data
from within the container.
To clarify, /data
before the colon (:
) represents the directory path on the host system. This is the directory that you want to make accessible within the container.
On the other hand, /data
after the colon (:
) represents the mount point within the container. This is the directory path where the host directory will be accessible from within the container.
System-defined bind paths
The system administrator has the ability to define what bind paths will be included automatically inside each container. Some bind paths are automatically derived (e.g. a user’s home directory) and some are statically defined (e.g. bind paths in the SingularityCE configuration file). In the default configuration, the system default bind points are $HOME
, /sys:/sys
, /proc:/proc
, /tmp:/tmp
, /var/tmp:/var/tmp
, /etc/resolv.conf:/etc/resolv.conf
, /etc/passwd:/etc/passwd
, and $PWD
. Where the first path before :
is the path from the host and the second path is the path in the container.
The pyscenic grn
command is executed inside the container with the specified options and arguments. The output file expr_mat.adjacencies.tsv
will be written to the /data
directory on your system. The /data/expr_mat.tsv
and /data/allTFs_hg38.txt
files are assumed to be input files located in the local /data
directory.
Make sure to adjust the paths and filenames according to your specific setup before running the command.