Hi everyone,
Posting here as I can't quite seem to find a definite answer.
I would like to assess cell-level pathway activity using AUCell on my integrated dataset. The vignette says to use the 'data' slot, which for me is a 3000 x 10024 (genes x cells) expression matrix. However, this expression matrix consists of scaled data (using 3000 highly variable genes).
I am wondering wether using this data as the input for AUCell will limit my results?
Should I be using all genes for every cell instead? If so, how to I obtain a normalised expression matrix containing every gene for every cell from my integrated Seurat object without having to rerun the analysis and subsequently losing the clustering?
Thanks for your help and advice in advance.
Hi there!
I am having the same question. How did it goes? I was doing my analysis in the integration and a subset data, it seems that AUCell give a different result when I look at it on my UMAP plot under the same scale. I really appreciate any feddback about this analysis.
Thank you
the Seurat
data
slot doesn't hold "scaled" data, that's what the
scaled.data
slot is for. See
https://github.com/satijalab/seurat/wiki/Assay#slots
for more details. My guess is that you are referring "normalized" data.
The above point aside, it sounds like you are specifically looking at the "integrated" assay which only includes the highly variable genes used for integration. However AUCell doesn't care about the integration, and as noted in the
tutorial
Since the scoring method is ranking-based, AUCell is independent of the gene expression units and the normalization procedure.
Given the way AUCell uses
aucMax
for scoring gene set activation I think having more genes in your input will be helpful, not to mention that gene set enrichment typically benefits from information from both highly variable genes as well as less variable genes. But I am interested to hear what other have to say on the matter.
All this to say, you can reasonably use the raw counts (i.e. assay "RNA" and slot "counts") from your integrated object for AUCell input.
Thanks for your reply, jv!
Yes, apologies, I wrote this post in abit of a frantic rush. I was meant to say that I am looking at the integrated data object as I have already peformed clustering and UMAP(ing) and wanted to overlay the GSEA results on top of the UMAP. I did notice that sentence in the tutorial but was questioning the methodology - if I feed it the scaled data (3000HVGs for each cell), then it can only perform its independent procedure on those genes.
I was thinking the same thing, but a lot of people seem to from upon using raw data for certain analyses so thought it was a good idea to ask on here. Thanks for you input, I think I will try this and let you know how it goes!