params {
tools {
scublet {
threshold = [
"<sample-name>": <custom-threshold>
This method is specifc to sample generated by the 10x Genomics single-cell platform. This method is based on the rate of the expected number of doublets in 10x Genomics samples. The number of doublets called (D) will be equal to the rate of doublets (given a number of cells) times the number of cells in that 10x Genomics sample. The cells are then ranked by their Scrublet doublet score (descending order) and the top D cells are called as doublets.
out/data/*.CELDA_DECONTX_{FILTER,CORRECT}.h5ad
A h5ad file with either the filtered matrix using one of the provided filters or the corrected (decontaminated) matrix by DecontX.
out/data/celda/*.CELDA__DECONTX.Rds
A Rds file containing the SingleCellExperiment object processed by DecontX.
out/data/celda/*.CELDA__DECONTX.Contamination_Outlier_Table.tsv
A cell-based .tsv file containing data generated by DecontX and additional outlier masks:
decontX_contamination
decontX_clusters
celda_decontx__{doublemad,scater_isOutlier_3MAD,custom_gt_0.5}_predicted_outliers
out/data/celda/*.CELDA__DECONTX.Contamination_Outlier_Thresholds.tsv
A .tsv containing a table with the different threshold for generating the outlier masks.
out/data/celda/*.CELDA__DECONTX.Contamination_Score_Density_with_{doublemad,scater_isOutlier_3MAD,custom_gt_0.5}.pdf
A .pdf plot showing the density of the decontamination score from DecontX and the outlier area highlighted for the given outlier threshold.
out/data/celda/*.CELDA__DECONTX.UMAP_Contamination_Score.pdf
A .pdf plot showing the DecontX contamination score on top of a UMAP generated from the decontaminated matrix.
out/data/celda/*.CELDA__DECONTX.UMAP_Clusters.pdf
A .pdf plot showing a UMAP generated by DecontX and from the decontaminated matrix.
single_sample_decontx 
Runs the single_sample
workflow above together with the DecontX workflow.
The DecontX workflow is running from the input data.
The final processed file from the single_sample
pipeline is annotated with the cell-based data generated by DecontX.
See single_sample
and decontx
to know more about the files generated by this pipeline.
single_sample_decontx_scrublet 
Runs the single_sample
workflow above together with the DecontX workflow.
The single_sample
workflow is running from the input data.
The decontx
workflow is running from the input data.
The scrublet
workflow is running from the output of the DecontX workflow.
The final processed file from the single_sample
pipeline is annotated with the cell-based data generated by DecontX and Scrublet.
See single_sample
, decontx
and scrublet
to know more about the files generated by this pipeline.
scenic 
Runs the scenic
workflow alone, generating a loom file with only the SCENIC results.
Currently, the required input is a loom file (set by params.tools.scenic.filteredLoom).
scenic_multiruns

Runs the scenic
workflow multiple times (set by params.tools.scenic.numRuns
), generating a loom file with the aggregated results from the multiple SCENIC runs.
Note that this is not a complete entry-point itself, but a configuration option for the scenic module.
Simply adding -profile scenic_multiruns during the config step will activate this analysis option for any of the standard entrypoints.
cellranger
Runs the cellranger
workflow (makefastq
, then count
).
Input parameters are specified within the config file:
params.tools.cellranger.mkfastq.csv
: path to the CSV samplesheet
params.tools.cellranger.mkfastq.runFolder
: path of Illumina BCL run folder
params.tools.cellranger.count.transcriptome
: path to the Cell Ranger compatible transcriptome reference
cellranger_count_metadata
Given the data stored as:
MKFASTQ_ID_SEQ_RUN1
|-- MAKE_FASTQS_CS
-- outs
|-- fastq_path
|-- HFLC5BBXX
|-- test_sample1
| |-- sample1_S1_L001_I1_001.fastq.gz
| |-- sample1_S1_L001_R1_001.fastq.gz
| |-- sample1_S1_L001_R2_001.fastq.gz
| |-- sample1_S1_L002_I1_001.fastq.gz
| |-- sample1_S1_L002_R1_001.fastq.gz
| |-- sample1_S1_L002_R2_001.fastq.gz
| |-- sample1_S1_L003_I1_001.fastq.gz
| |-- sample1_S1_L003_R1_001.fastq.gz
| |-- sample1_S1_L003_R2_001.fastq.gz
|-- test_sample2
| |-- sample2_S2_L001_I1_001.fastq.gz
| |-- sample2_S2_L001_R1_001.fastq.gz
| |-- ...
|-- Reports
|-- Stats
|-- Undetermined_S0_L001_I1_001.fastq.gz
-- Undetermined_S0_L003_R2_001.fastq.gz
MKFASTQ_ID_SEQ_RUN2
|-- MAKE_FASTQS_CS
-- outs
|-- fastq_path
|-- HFLY8GGLL
|-- test_sample1
| |-- ...
|-- test_sample2
| |-- ...
|-- ...
and a metadata table:
Minimally Required Metadata Table
Optional columns:
short_uuid
: sample_name
will be prefix by this value. This should be the same between sequencing runs of the same biological replicate
expect_cells
: This number will be used as argument for the --expect-cells
parameter in cellranger count
.
chemistry
: This chemistry will be used as argument for the --chemistry
parameter in cellranger count
.
and a config:
nextflow config \
~/vib-singlecell-nf/vsn-pipelines \
-profile cellranger_count_metadata \
> nextflow.config
and a workflow run command:
nextflow run \
~/vib-singlecell-nf/
vsn-pipelines \
-entry cellranger_count_metadata
The workflow will run Cell Ranger count on 2 samples, each using the 2 sequencing runs.
NOTES:
If fastqs_dir_name
does not exist, set it to none
demuxlet/freemuxlet
Runs the demuxlet
or freemuxlet
workflows (dsc-pileup
[with prefiltering], then freemuxlet
or demuxlet
)
Input parameters are specified within the config file:
params.tools.popscle.vcf
: path to the VCF file for demultiplexing
params.tools.popscle.freemuxlet.nSamples
: Number of clusters to extract (should match the number of samples pooled)
params.tools.popscle.demuxlet.field
: Field in the VCF with genotype information
nemesh
Runs the nemesh
pipeline (Drop-seq) on a single sample or multiple samples separately.
Source
bbknn 
Runs the bbknn
workflow (sample-specific filtering, merging of individual samples, normalization, log-transformation, HVG selection, PCA analysis, then the batch-effect correction steps: BBKNN, clustering, dimensionality reduction (UMAP only)).
The output is a loom file with the results embedded.
Source: https://github.com/Teichlab/bbknn/blob/master/examples/pancreas.ipynb
Output Files (not exhaustive list)
out/data/*.BBKNN.h5ad
Scanpy-ready h5ad file containing all results. The raw.X slot contains the log-normalized data (if normalization & transformation steps applied) while the X slot contains the log-normalized scaled data.
out/data/*.BBKNN.loom
SCope-ready loom file containing all results.
bbknn_scenic 
Runs the bbknn
workflow above, then runs the scenic
workflow on the output, generating a comprehensive loom file with the combined results.
This could be very resource intensive, depending on the dataset.
Output Files (not exhaustive list)
out/data/*.BBKNN.h5ad
Scanpy-ready h5ad file containing all results from a bbknn workflow run. The raw.X slot contains the log-normalized data (if normalization & transformation steps applied) while the X slot contains the log-normalized scaled data.
out/data/*.BBKNN_SCENIC.loom
SCope-ready loom file containing all results from a bbknn workflow and a scenic workflow run (e.g.: regulon AUC matrix, regulons, …).
harmony 
Runs the harmony
workflow (sample-specific filtering, merging of individual samples, normalization, log-transformation, HVG selection, PCA analysis, batch-effect correction (Harmony), clustering, dimensionality reduction (t-SNE and UMAP)).
The output is a loom file with the results embedded.
Output Files (not exhaustive list)
out/data/*.HARMONY.h5ad
Scanpy-ready h5ad file containing all results. The raw.X slot contains the log-normalized data (if normalization & transformation steps applied) while the X slot contains the log-normalized scaled data.
out/data/*.HARMONY.loom
SCope-ready loom file containing all results.
harmony_scenic 
Runs the harmony
workflow above, then runs the scenic
workflow on the output, generating a comprehensive loom file with the combined results.
This could be very resource intensive, depending on the dataset.
Output Files (not exhaustive list)
out/data/*.HARMONY.h5ad
Scanpy-ready h5ad file containing all results from a harmony workflow run. The raw.X slot contains the log-normalized data (if normalization & transformation steps applied) while the X slot contains the log-normalized scaled data.
out/data/*.HARMONY_SCENIC.loom
SCope-ready loom file containing all results from a harmony workflow and a scenic workflow run (e.g.: regulon AUC matrix, regulons, …).
mnncorrect 
Runs the mnncorrect
workflow (sample-specific filtering, merging of individual samples, normalization, log-transformation, HVG selection, PCA analysis, batch-effect correction (mnnCorrect), clustering, dimensionality reduction (t-SNE and UMAP)).
The output is a loom file with the results embedded.
Output Files (not exhaustive list)
out/data/*.MNNCORRECT.h5ad
Scanpy-ready h5ad file containing all results. The raw.X slot contains the log-normalized data (if normalization & transformation steps applied) while the X slot contains the log-normalized scaled data.
out/data/*.MNNCORRECT.loom
SCope-ready loom file containing all results.
Utility Pipelines
Contrary to the aformentioned pipelines, these are not end-to-end. They are used to perform small incremental processing steps.
cell_annotate
Runs the cell_annotate
workflow which will perform a cell-based annotation of the data using a set of provided .tsv metadata files.
We show a use case here below with 10x Genomics data were it will annotate different samples using the obo
method. For more information
about this cell-based annotation feature please visit Cell-based metadata annotation section.
First, generate the config :
nextflow config \
~/vib-singlecell-nf/vsn-pipelines \
-profile tenx,utils_cell_annotate,singularity
Make sure the following parts of the generated config are properly set:
[...]
data {
tenx {
cellranger_mex = '~/out/counts/*/outs/'
tools {
scanpy {
container = 'vibsinglecellnf/scanpy:1.8.1'
cell_annotate {
off = 'h5ad'
method = 'obo'
indexColumnName = 'BARCODE'
cellMetaDataFilePath = "~/out/data/*.best"
sampleSuffixWithExtension = '_demuxlet.best'
annotationColumnNames = ['DROPLET.TYPE', 'NUM.SNPS', 'NUM.READS', 'SNG.BEST.GUESS']
[...]
[...]
Now we can run it with the following command:
nextflow -C nextflow.config \
run ~/vib-singlecell-nf/vsn-pipelines \
-entry cell_annotate \
> nextflow.config
cell_annotate_filter 
Runs the cell_annotate_filter
workflow which will perform a cell-based annotation of the data using a set of provided .tsv metadata files following by a cell-based filtering.
We show a use case here below with 10x Genomics data were it will annotate different samples using the obo
method. For more information
about this cell-based annotation feature please visit Cell-based metadata annotation section and Cell-based metadata filtering section.
First, generate the config :
nextflow config \
~/vib-singlecell-nf/vsn-pipelines \
-profile tenx,utils_cell_annotate,utils_cell_filter,singularity \
> nextflow.config
Make sure the following parts of the generated config are properly set:
[...]
data {
tenx {
cellranger_mex = '~/out/counts/*/outs/'
tools {
scanpy {
container = 'vibsinglecellnf/scanpy:1.8.1'
cell_annotate {
off = 'h5ad'
method = 'obo'
indexColumnName = 'BARCODE'
cellMetaDataFilePath = "~/out/data/*.best"
sampleSuffixWithExtension = '_demuxlet.best'
annotationColumnNames = ['DROPLET.TYPE', 'NUM.SNPS', 'NUM.READS', 'SNG.BEST.GUESS']
cell_filter {
off = 'h5ad'
method = 'internal'
filters = [
id:'NO_DOUBLETS',
sampleColumnName:'sample_id',
filterColumnName:'DROPLET.TYPE',
valuesToKeepFromFilterColumn: ['SNG']
[...]
[...]
Now we can run it with the following command:
nextflow -C nextflow.config \
run ~/vib-singlecell-nf/vsn-pipelines \
-entry cell_filter
sra
Runs the sra
workflow which will download all (or user-defined selected) FASTQ files from a particular SRA project and format those with properly and human readable names.
First, generate the config :
nextflow config \
~/vib-singlecell-nf/vsn-pipelines \
-profile sra,singularity \
> nextflow.config
NOTES:
The download of SRA files is by default limited to 20 Gb. If this limit needs to be increased please set params.tools.sratoolkit.maxSize
accordingly. This limit can be ‘removed’ by setting the parameter to an arbitrarily high number (e.g.: 9999999999999).
If you’re a VSC user, you might want to add the vsc
profile.
The final output (FASTQ files) will available in out/data/sra
If you’re downloading 10x Genomics scATAC-seq data, make sure to set params.tools.sratoolkit.includeTechnicalReads = true
and properly set params.utils.sra_normalize_fastqs.fastq_read_suffixes
. In the case of downloading the scATAC-seq samples of SRP254409, fastq_read_suffixes
would be set to ["R1", "R2", "I1", "I2"]
.
Now we can run it with the following command:
nextflow -C nextflow.config \
run ~/vib-singlecell-nf/vsn-pipelines \
-entry sra
$ nextflow -C nextflow.config run ~/vib-singlecell-nf/vsn-pipelines -entry sra
N E X T F L O W ~ version 21.04.3
Launching `~/vib-singlecell-nf/vsn-pipelines/main.nf` [sleepy_goldstine] - revision: ba1dedbf51
executor > local (23)
[12/25b9d4] process > sra:DOWNLOAD_FROM_SRA:SRA_TO_METADATA (1) [100%] 1 of 1 _
[e2/d5a429] process > sra:DOWNLOAD_FROM_SRA:SRATOOLKIT__DOWNLOAD_FASTQS:DOWNLOAD_FASTQS_FROM_SRA_ACC_ID (4) [ 33%] 3 of 9
[30/cba7a0] process > sra:DOWNLOAD_FROM_SRA:SRATOOLKIT__DOWNLOAD_FASTQS:FIX_AND_COMPRESS_SRA_FASTQ (3) [100%] 3 of 3
[76/97ce6e] process > sra:DOWNLOAD_FROM_SRA:NORMALIZE_SRA_FASTQS (3) [100%] 3 of 3
[8c/3125c4] process > sra:PUBLISH:SC__PUBLISH (11) [100%] 12 of 12