添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
  • export2loom: export2loom
  • exportsForArboreto: exportsForArboreto (alias: exportsForGRNBoost)
  • geneFiltering: geneFiltering
  • getDbAnnotations: getDbAnnotations
  • getDbTfs: getDbTfs
  • getRegulonName: getRegulonName
  • getTF: getTF
  • importArboreto: importArboreto
  • importAUCfromText: importAUCfromText (deprecated)
  • importModulesGmt: importModules (deprecated)
  • onlyNonDuplicatedExtended: onlyNonDuplicatedExtended
  • plotEmb_rgb: plotEmb_rgb
  • plotRSS: Plot RSS
  • plotTsne_AUCellApp: plotTsne_AUCellApp
  • plotTsne_AUCellHtml: plotTsne_AUCellHtml
  • plotTsne_compareSettings: plotTsne_compareSettings
  • regulon_plotExpression: regulon_plotExpression
  • regulonsToGeneLists: regulonsToGeneSet
  • runCorrelation: runCorrelation
  • runGenie3: runGenie3
  • runSCENIC_1_coexNetwork2modules: runSCENIC_1_coexNetwork2modules
  • runSCENIC_2_createRegulons: runSCENIC_2_createRegulons
  • runSCENIC_3_scoreCells: runSCENIC_3_scoreCells
  • runSCENIC_4_aucell_binarize: runSCENIC_4_aucell_binarize
  • ScenicOptions-class: Object to store SCENIC settings
  • selectRegulons: selectRegulons
  • tsneAUC: tsneAUC
  • viewMotifs: viewMotifs
  • Browse all...
  • In aertslab/SCENIC: SCENIC (Single Cell rEgulatory Network Inference and Clustering)
    # Suppress loading messages when building the HTML
    suppressPackageStartupMessages({
    library(data.table)
    library(Biobase)
    library(SingleCellExperiment)
    library(GEOquery) # Optional
    # Do not convert strings to factors
    options(stringsAsFactors=FALSE)
    # To build a personalized report, update this working directory:
    # knitr::opts_knit$set(root.dir = 'SCENIC_MouseBrain')
    

    Introduction to SCENIC

    SCENIC is a tool to simultaneously reconstruct gene regulatory networks and identify stable cell states from single-cell RNA-seq data. The gene regulatory network is inferred based on co-expression and DNA motif analysis, and then the network activity is analyzed in each cell to identify the recurrent cellular states.

    More info & citation

    The approach behind SCENIC and its application to several datasets (e.g. usage examples) was presented in the following article:

    Aibar et al. (2017) SCENIC: single-cell regulatory network inference and clustering. Nature Methods. doi: 10.1038/nmeth.4463.

    Please, cite this article if you use SCENIC in your research.

    Requirements

    Species

    The current version of SCENIC supports human, mouse and fly (Drosophila melanogaster).

    To apply SCENIC to other species, it would require manual adjustments on the second step (e.g. create new RcisTarget databases or using a diferent motif-enrichment-analysis tool).

    Input: expression matrix

    The input to SCENIC is the single-cell RNA-seq expression matrix:

    Each column corresponds to a sample (cell) and each row corresponds to a gene.

    The gene ID should be the gene-symbol and stored as rownames (for compatibility with RcisTarget annotation databases).

    Expression units: The preferred expression values are gene-summarized counts. There is currently not a strong recommendation towards using the raw counts, or counts normalized through single-cell specific methods (e.g. Seurat). Other measurements, such as transcripts/counts per million (TPM) and FPKM/RPKM, are also accepted as input. However, note that some authors recommend avoiding within sample normalization (i.e. TPM) for co-expression analysis (first step of SCENIC) because they may induce artificial co-variation (@crow2016). The choice of input expression matrix might have some effect on the co-expression analysis to create the regulons (step 1). The other steps of the workflow are not directly affected by the input expression values: (2) The expression is not taken into account for the motif analysis, and (3) AUCell, which is used for scoring the regulons on the cells, is cell ranking-based (it works as an implicit normalization). Overall, SCENIC is quite robust to this choice, we have applied SCENIC to datasets using raw (logged) UMI counts, normalized UMI counts, and TPM and they all provided reliable results (see @aibar2017).

    Availability (R / Python)

    SCENIC is implemented in R (this package and tutorial) and Python (pySCENIC).

    The Python implementation is significantly faster to run, so we generally recommend using it for most analyses. We provide containers (in Docker and Singularity) including all the required dependencies, and a Nextflow pipeline (VSN, useful to run SCENIC in batch on multiple datasets). This should make it very easy to install and run (py)SCENIC, even for users with limited experience in Python.

    The results are equivalent across versions and provide output .loom files that can be explored in SCope or used as interface between R and Python.

    The rest of this tutorial will continue with the R implementation, but the general concepts behind SCENIC algorithm also apply to the other versions.

    Installation

    The R implementation of SCENIC is based on three R packages:

    r Biocpkg("GENIE3") to infer the co-expression network (faster alternative: GRNBoost2)

    r Biocpkg("RcisTarget") for the analysis of transcription factor binding motifs

    r Biocpkg("AUCell") to identify cells with active gene sets (gene-network) in scRNA-seq data

    Therefore, you will need to install these packages, and some extra dependencies, to run SCENIC:

    if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
    BiocManager::version()
    ## Required
    BiocManager::install(c("AUCell", "RcisTarget"))
    BiocManager::install(c("GENIE3")) # Optional. Can be replaced by GRNBoost
    ## Optional (but highly recommended):
    # To score the network on cells (i.e. run AUCell):
    BiocManager::install(c("zoo", "mixtools", "rbokeh"))
    # For various visualizations and perform t-SNEs:
    BiocManager::install(c("DT", "NMF", "ComplexHeatmap", "R2HTML", "Rtsne"))
    # To support paralell execution in tSNE (not available in Windows):
    BiocManager::install(c("doMC", "doRNG"))
    # To export/visualize in http://scope.aertslab.org
    if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
    devtools::install_github("aertslab/SCopeLoomR", build_vignettes = TRUE)
    

    You are now ready to install SCENIC:

    if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
    devtools::install_github("aertslab/SCENIC") 
    packageVersion("SCENIC")
    

    For older versions of R/Bioconductor

    For older versions of R/Bioconductor, you might need to install a previous version of SCENIC. e.g.:

    # For R version 3.6 and Bioconductor 3.9:  
    devtools::install_github("aertslab/[email protected]")
    

    In case you have trouble installing the packages from Bioconductor (for example, for older versions of R), you may try installing them from Github or directly from the Bioconductor package files:

    # Github:
    devtools::install_github("aertslab/AUCell")
    devtools::install_github("aertslab/RcisTarget")
    devtools::install_github("aertslab/GENIE3")
    # Bioconductor
    # install.packages("https://bioconductor.org/packages/release/bioc/src/contrib/PACKAGENAME.tar.gz", repos=NULL)
    

    Species-specific databases

    In addition to the R-packages, you will also need to download the species-specific databases for RcisTarget (the motif rankings). The links to all the available databases are available in our website. By default, SCENIC uses the databases that score the motifs in the promoter of the genes (up to 500bp upstream the TSS), and in the 20kb around the TSS (+/-10kbp).

    For human:

    dbFiles <- c("https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-500bp-upstream-7species.mc9nr.feather",
    "https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-tss-centered-10kb-7species.mc9nr.feather")
    # mc9nr: Motif collection version 9: 24k motifs
    

    For mouse:

    dbFiles <- c("https://resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/mm9-500bp-upstream-7species.mc9nr.feather",
    "https://resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/mm9-tss-centered-10kb-7species.mc9nr.feather")
    # mc9nr: Motif collection version 9: 24k motifs
    

    For fly:

    dbFiles <- c("https://resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/mc8nr/gene_based/dm6-5kb-upstream-full-tx-11species.mc8nr.feather")
    # mc8nr: Motif collection version 8: 20k motifs
    

    The approximate file size for these databases (.feather files) is 1GB. To avoid problems or incomplete downloads (especially if you have a slow connection), we recommend to use zsync_curl.

    If you prefer to download directly from R, you can try the following code:*

    # dir.create("cisTarget_databases"); setwd("cisTarget_databases") # if needed
    options(timeout = max(300, getOption("timeout")))
    for(featherURL in dbFiles)
      download.file(featherURL, destfile=basename(featherURL), mode = "wb") # saved in current dir
    

    To confirm that the databases were downloaded correctly, we recommend to confirm its sha256sum: https://resources.aertslab.org/cistarget/databases/sha256sum.txt

    After these setup steps, SCENIC is ready to run! To start, see the vignette "SCENIC_Running".

    Some tips...

    Template for your own analysis

    You can use the R notebooks of this workflow as template for your own data (i.e. copy the .Rmd file, and edit it in RStudio).

    vignetteFile <- file.path(system.file('doc', package='SCENIC'), "SCENIC_Running.Rmd")
    file.copy(vignetteFile, "SCENIC_myRun.Rmd")
    # or: 
    vignetteFile <- "https://raw.githubusercontent.com/aertslab/SCENIC/master/vignettes/SCENIC_Running.Rmd"
    download.file(vignetteFile, "SCENIC_myRun.Rmd")
    

    Note that some steps of this workflow can take considerable time. To avoid re-running these steps when knitting the vignette (i.e. create the HTML report), we have added eval=FALSE to some code chunks and load() its output in the next. Feel free to adapt these to your needs.

    At any time, you an access the help for any function used in this workflow (i.e. for details on their arguments), and the vignettes of the other steps of the workflow with the following commands:

    ## Get help for a function:
    ?runSCENIC_3_scoreCells
    help(runSCENIC_3_scoreCells) # equivalent
    ## See the available tutorials:
    vignette(package="SCENIC") # list
    vignette("SCENIC_Running") # open
    

    Sample dataset

    Running SCENIC in a real dataset typically takes a few hours. The toy example used in these tutorials (200 cells and <1000 genes) is a subset of a dataset of 3005 cells from the adult mouse brain, including neurons (e.g. pyramidal neurons and interneurons) and glia (oligodendrocytes, astrocytes/ependymal, endothelial/mural and microglia). The expression values are Unique Molecular Identifier counts.

    Zeisel, A., et al. (2015). Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142. doi: 10.1126/science.aaa1934

    The output files from the run on the full dataset are available at http://scenic.aertslab.org/examples/

    Formatting input

    Reminder: SCENIC's main input is a single-cell expression matrix with genes-symbols as row names.

    Below there are several options to download and format this dataset (as example to format/import your own data).

    a) From .loom file

    .loom files can be directly imported into SCENIC through SCopeLoomR.

    e.g. for the 3005 mouse brain cells dataset:

    ## Download:
    download.file("http://loom.linnarssonlab.org/clone/Previously%20Published/Cortex.loom", "Cortex.loom")
    loomPath <- "Cortex.loom"
    

    To load the expression matrix and cell annotation:

    library(SCopeLoomR)
    loom <- open_loom(loomPath, mode="r+") # We recommend to open files as read-only (mode="r"). However, since this is an old-format loom file, it needs write access to update it.
    exprMat <- get_dgem(loom)
    cellInfo <- get_cell_annotation(loom)
    close_loom(loom)
    

    b) From 10X/CellRanger output files

    10X/CellRanger output matrices can be used as input for SCENIC. The tutorials on how to load the CellRanger output into R are available at 10X website (choose the appropriate CellRanger version): https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/matrices#r-load-mat

    Some packages, such as Seurat, also provide functions to directly import 10X/CellRanger output:

    singleCellMatrix <- Seurat::Read10X(data.dir="../data/pbmc3k/filtered_gene_bc_matrices/hg19/")
    

    c) From other R objects (e.g. Seurat, SingleCellExperiment)

    Many R packages store the expression data in their own data structures (e.g. Seurat) or Bioconductor classes (e.g. SingleCellExperiment SummarizedExperiment, ExpressionSet).

    Most of these objects have data acessors to retrieve the expression matrix and cell metadata (the function name depends on the object type or package).

    For example, for a SingleCellExperiment object the accessors are counts() and colData():

    library(SingleCellExperiment)
    ## Get data from sce object:
    exprMat <- counts(sce)
    cellInfo <- colData(sce)
    

    To use Seurat clusters as cell annotation (e.g. for visualization):

    cellInfo <- data.frame(seuratCluster=Idents(seuratObject))
    

    d) From GEO

    Example downloading and formatting a dataset from GEO (e.g. the 3005 mouse brain cells dataset is available as GEO accession number GSE60361):

    # dir.create("SCENIC_MouseBrain"); setwd("SCENIC_MouseBrain") # if needed
    # (This may take a few minutes)
    if (!requireNamespace("GEOquery", quietly = TRUE)) BiocManager::install("GEOquery")
    library(GEOquery)
    geoFile <- getGEOSuppFiles("GSE60361", makeDirectory=FALSE)
    gzFile <- grep("Expression", basename(rownames(geoFile)), value=TRUE)
    txtFile <- gsub(".gz", "", gzFile)
    gunzip(gzFile, destname=txtFile, remove=TRUE)
    library(data.table)
    geoData <- fread(txtFile, sep="\t")
    geneNames <- unname(unlist(geoData[,1, with=FALSE]))
    exprMatrix <- as.matrix(geoData[,-1, with=FALSE])
    rm(geoData)
    dim(exprMatrix)
    rownames(exprMatrix) <- geneNames
    exprMatrix <- exprMatrix[unique(rownames(exprMatrix)),]
    exprMatrix[1:5,1:4]
    # Remove file downloaded:
    file.remove(txtFile)
    

    The cell types are available at the author's website. For simplicity, here we will load them from AUCell package:

    cellLabels <- paste(file.path(system.file('examples', package='AUCell')), "mouseBrain_cellLabels.tsv", sep="/")
    cellLabels <- read.table(cellLabels, row.names=1, header=TRUE, sep="\t")
    cellInfo <- as.data.frame(cellLabels)
    colnames(cellInfo) <- "CellType"
    

    Saving into loom

    To run SCENIC in Python we recommend to save/export the expression matrix and cell metadata into a .loom file (for R, any R/Bioconductor object is also OK):

    # setwd("SCENIC_MouseBrain")
    dir.create("data")
    loom <- build_loom("data/mouseBrain.loom", dgem=exprMatrix)
    loom <- add_cell_annotation(loom, cellInfo)
    close_loom(loom)
    

    References