pyscenic 安装单细胞转录因子分析数据下载_databases ranking the whole genome小鼠怎么下载

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

r版本：rSCENIC-R安装及运行 - 生信小木屋

在SCENIC的R版本中目前不适用使用新版本的数据库，具体的issue见 feather v1 or v2 for R package 。因此为了使用R版本的issue使用的旧版本的数据库

cisTarget databases - Feather v1 databases

特别注意：

If you are using pySCENIC < 0.12.0 and ctxcore < 0.2.0 you will need to retrieve the databases from the old folder, in feather v1 format.

However, we recommend to update to the latest versions to use feather v2 (smaller and easier to read) and use the most recent databases and annotations (v10nr_clust/mc_v10_clust).

There are two main types of databases:

Gene-based databases are meant to be used with (py)SCENIC and for motif enrichment in gene sets with cisTarget.
Region-based databases are meant to be used with SCENIC+ and for motif enrichment in region sets with cisTarget.

step1 安装

conda create -n pyscenic python=3.9 conda activate pyscenic #安装依赖包 conda install -y numpy conda install -y -c anaconda cytoolz conda install -y scanpy #安装pyscenic pip install pyscenic -i http://pypi.douban.com/simple/

step 2TF注释 Auxiliary datasets

https://resources.aertslab.org/cistarget/databases/ https://resources.aertslab.org/cistarget/databases/

根据物种选择数据库

1.feather格式的ranking 排名数据库

2.基序==》转录因子 注释数据库 TSV text 文件格式 .tbl 浏览器

3.转录因子列表浏览器复制 txt

To successfully use this pipeline you also need auxilliary datasets available at cistargetDBs website :

Databases ranking the whole genome 排名数据库 Databases ranking the whole genome of your species of interest based on regulatory features (i.e. transcription factors) in feather format.
Motif to TF annotations 注释数据库 Motif to TF annotations database providing the missing link between an enriched motif and the transcription factor that binds this motif. This pipeline needs a TSV text file where every line represents a particular annotation.

Caution

These ranking databases are 1.1 Gb each so downloading them might take a while. An annotations file is typically 100Mb in size.

A list of transcription factors is required for the network inference step (GENIE3/GRNBoost2).

1 转录因子数据库下载 TF lists

mkdir all_tf_list && cd all_tf_list && wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_dmel.txt
2016 cd all_tf_list && wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_dmel.txt
2017 ls
2018 wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_hg38.txt
2019 wget -c https://resources.aertslab.org/cistarget/tf_lists/allTFs_mm.txt

2 feather文件下载排名数据库

Welcome to the cisTarget resources website!

To download a database for motif enrichment, go to databases .
To download a motif annotations, go to motif2tf .
To download our cluster-buster implementation, go to programs .
To download precomputed regions for creating gene-based databases, go to regions .
To download the lists of transcription factors (TFs) for human, mouse and fly, go to tf_lists .
To download chip-seq tracks annotations, go to track2tf .

We recommend using the most recent databases and annotations (v10nr_clust).

IMPORTANT: The cisTarget database files are quite big (most of them 1-100GB). To avoid corrupt or incomplete downloads, files can be downloaded with zsync_curl (which is basically rsync over HTTP(S)). It allows resuming already partially downloaded databases and only will download missing or redownload corrupted chunks.

# Download (with wget or curl): wget https://resources.aertslab.org/cistarget/zsync_curl # curl -O https://resources.aertslab.org/cistarget/zsync_curl # Make executable: chmod a+x zsync_curl # Display full path to zsync_curl. ZSYNC_CURL="${PWD}/zsync_curl" echo "${ZSYNC_CURL}" # Compile zsync_curl from source # Display path to zsync_curl: ZSYNC_CURL='zsync_curl' echo "${ZSYNC_CURL}"

Homo sapiens - hg38 - refseq_r80 - SCENIC+ databases - Gene based

Homo sapiens - hg38

Homo sapiens - hg38 - refseq_r80 - v9 databases - Gene based

# Specify database: feather_database_url='https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather'

# Download database directly (with wget or curl):
nohup wget -c "${feather_database_url}" &
# curl -O "${feather_database_url}"

# Specify database:
feather_database_url='https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather'
feather_database_url='https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.genes_vs_motifs.rankings.feather'
# Download database directly (with wget or curl):
nohup wget -c "${feather_database_url}" &
# curl -O "${feather_database_url}"

但是现在更新了！！！Homo sapiens - hg38 - refseq_r80

推荐使用mc_v10_clust

mm10 Mus musculus - mm10 - refseq_r80 - SCENIC+ databases - Gene based

mkdir mm10 &&cd mm10

# Specify database:
feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc9nr/gene_based/mm10__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather'
# feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc9nr/gene_based/mm10__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.genes_vs_motifs.rankings.feather'
feather_database="${feather_database_url##*/}"
# Download database directly (with wget or curl):
wget "${feather_database_url}"
# curl -O "${feather_database_url}"
 # Specify database:
feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'
# feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather'
#feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'
# feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather'
feather_database_url='https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'
feather_database="${feather_database_url##*/}"
# Download database directly (with wget or curl):
wget "${feather_database_url}"
# curl -O "${feather_database_url}" 
# Download sha256sum.txt (with wget or curl):
wget https://resources.aertslab.org/cistarget/databases/sha256sum.txt
# curl -O https://resources.aertslab.org/cistarget/databases/sha256sum.txt
# Check if sha256 checksum matches for the downloaded database:
awk -v feather_database=${feather_database} '$2 == feather_database' sha256sum.txt | sha256sum -c -
# If you downloaded mulitple databases, you can check them all at onces with:
sha256sum -c sha256sum.txt

#############最新feather数据库 v10

Homo sapiens - hg38 - refseq_r80 - SCENIC+ databases - Gene basedhttps://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/

https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather

Mus musculus - mm10 - refseq_r80 - SCENIC+ databases - Gene basedhttps://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/ https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather

3注释数据库 tbl

Motif2TF annotationshttps://resources.aertslab.org/cistarget/motif2tf/

Motif2TF annotations

We provide motif annotations for the following species:

Human (hgnc)
Mouse (mgi)
Fly (flybase)
Chicken

使用条件范围

For each specie, we provide annotations depending on the motif collection used:

v8 (only Drosophila): Annotations based on the 2016 cisTarget motif collection. Use these files if you are using the mc8nr databases (only available for Drosophila).
v9: Annotations based on the 2017 cisTarget motif collection. Use these files if you are using the mc9nr databases.
v10: Annotations based on the 2022 SCENIC+ motif collection. Use these files if you are using the mc_v10_clust databases.

三个数据库

使用pyscenic做转录因子分析虽然有转录因子的缺失，但是转录组因子的规律并没有变化，在iCAF和mCAF这个亚群特异性激活的转录因子保持原文的样子。https://mp.weixin.qq.com/s/ncSW8EXrpzqD-3b7uXy5Mg

提取单细胞表达量矩阵csv忘记并且导入Linux服务器

首先我们对文章《Single-cell RNA sequencing highlights the role of inflammatory cancer-associated fibroblasts in bladder urothelial carcinoma》的单细胞转录组数据进行降维聚类分群，然后提取fibo这个子亚群，然后再随机挑取1000个fibo细胞，这样的表达量矩阵进行后续分析。

在seurat里面将矩阵筛选，然后输出成csv，再用python读入，然后打包成 loom

#注意矩阵一定要转置，不然会报错
write.csv(t(as.matrix(fibo@assays$RNA@counts)),file = "fibo_1000.csv")

Singularity/Apptainer

Singularity/Apptainer images can be build from the Docker Hub image as source:

# pySCENIC CLI version.
singularity build aertslab-pyscenic-0.12.1.sif docker://aertslab/pyscenic:0.12.1
apptainer build aertslab-pyscenic-0.12.1.sif docker://aertslab/pyscenic:0.12.1
# pySCENIC CLI version + ipython kernel + scanpy.
singularity build aertslab-pyscenic-scanpy-0.12.1-1.9.1.sif docker://aertslab/pyscenic_scanpy:0.12.1_1.9.1
apptainer build aertslab-pyscenic-0.12.1-1.9.1.sif docker://aertslab/pyscenic_scanpy:0.12.1_1.9.1

To run the aertslab-pyscenic-0.12.1.sif Singularity container with the pyscenic grn command, you can use the following command:

bashCopy code

singularity run aertslab-pyscenic-0.12.1.sif \
 pyscenic grn \
 -B /data:/data \
 --num_workers 6 \ 
-o /data/expr_mat.adjacencies.tsv \ 
/data/expr_mat.tsv \ 
/data/allTFs_hg38.txt

This command assumes that you have the aertslab-pyscenic-0.12.1.sif Singularity container file in your current working directory. If the container file is located elsewhere, you need to provide the full path to the container file in the singularity run command.

The command mounts the /data directory inside the container to the local /data directory on your system using the -B option. This allows accessing files and directories under /data from within the container.

To clarify, /data before the colon (:) represents the directory path on the host system. This is the directory that you want to make accessible within the container.

On the other hand, /data after the colon (:) represents the mount point within the container. This is the directory path where the host directory will be accessible from within the container.

System-defined bind paths

The system administrator has the ability to define what bind paths will be included automatically inside each container. Some bind paths are automatically derived (e.g. a user’s home directory) and some are statically defined (e.g. bind paths in the SingularityCE configuration file). In the default configuration, the system default bind points are $HOME , /sys:/sys , /proc:/proc, /tmp:/tmp, /var/tmp:/var/tmp, /etc/resolv.conf:/etc/resolv.conf, /etc/passwd:/etc/passwd, and $PWD. Where the first path before : is the path from the host and the second path is the path in the container.

The pyscenic grn command is executed inside the container with the specified options and arguments. The output file expr_mat.adjacencies.tsv will be written to the /data directory on your system. The /data/expr_mat.tsv and /data/allTFs_hg38.txt files are assumed to be input files located in the local /data directory.

Make sure to adjust the paths and filenames according to your specific setup before running the command.

可扩展的SCENIC工作流程，用于单细胞基因调控网络分析该存储库描述了如何对单细胞数据运行pySCENIC基因调控网络推断分析以及基本的“最佳实践”表达分析。这包括：独立的Jupyter笔记本电脑，用于交互式分析 Nextflow DSL1工作流程，它提供了一种半自动化且简化的方法来运行这些步骤 pySCENIC安装，使用和下游分析的详细信息另请参阅《自然规约》中的相关出版物： : 。有关此协议中步骤的高级实现，请参阅，这是pySCENIC的Nextflow DSL2实现，具有用于表达式分析的全面且可自定义的管道。这包括其他pySCENIC功能（多次运行，集成的基于主题和基于轨迹的regulon修剪，织机文件生成）。 PBMC 10k数据集（10x基因组学）完整的SCENIC分析，以及过滤，群集，可视化和SCope就绪的织机文件创建： | nohup 输出到指定文件 Linux nohup 实现命令后台运行并输出或记录到指定日志文件设置日志结果文件名称重定向到某个文件标准误标准错误输出定向输入报错信息保留2＞&1 & 18576 makoudada: 楼主，俺想问下，我加载SeuratWrappers 是成功的，但还是出现了报错 Error in (function (cond) : 在为'type'函数选择方法时评估'x'参数出了错: object of type 'S4' is not subsettable In addition: There were 15 warnings (use warnings() to see them) 多组别cellchat cmu小孩: 我看了一下，他的流程基本没有问题，seuobj提取分组，分别进行cc再合并，我没有遇到强度为0的情况，不知道你两是哪里出了问题 R语言| 16. 预测模型变量筛选: 代码篇 cox模型选择变量筛选准统计人: 请问，单因素筛选以后不应该根据P 值进行FDR或者BH 检验吗？直接按照P=0.05的话，一类错误是不是增大了？相当于在做多重比较热图pheatmap 热图3：热图行列分组信息注释可以吃饭了733: anno_Col明显就是错的啊，它要行名是samples，我不知道你写这个教程是否有运行过自己的代码 beam_res ＜- BEAM(mycds_sub, branch_point = 1) #, cores = 8 Error in if (progenitor_method == “duplic makoudada: 是版本问题

r版本 ：​​​​​​rSCENIC-R安装及运行 - 生信小木屋