添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

Hi everyone,

Posting here as I can't quite seem to find a definite answer.

I would like to assess cell-level pathway activity using AUCell on my integrated dataset. The vignette says to use the 'data' slot, which for me is a 3000 x 10024 (genes x cells) expression matrix. However, this expression matrix consists of scaled data (using 3000 highly variable genes).

I am wondering wether using this data as the input for AUCell will limit my results? Should I be using all genes for every cell instead? If so, how to I obtain a normalised expression matrix containing every gene for every cell from my integrated Seurat object without having to rerun the analysis and subsequently losing the clustering?

Thanks for your help and advice in advance.

Hi there!

I am having the same question. How did it goes? I was doing my analysis in the integration and a subset data, it seems that AUCell give a different result when I look at it on my UMAP plot under the same scale. I really appreciate any feddback about this analysis.

Thank you

the Seurat data slot doesn't hold "scaled" data, that's what the scaled.data slot is for. See https://github.com/satijalab/seurat/wiki/Assay#slots for more details. My guess is that you are referring "normalized" data.

The above point aside, it sounds like you are specifically looking at the "integrated" assay which only includes the highly variable genes used for integration. However AUCell doesn't care about the integration, and as noted in the tutorial

Since the scoring method is ranking-based, AUCell is independent of the gene expression units and the normalization procedure.

Given the way AUCell uses aucMax for scoring gene set activation I think having more genes in your input will be helpful, not to mention that gene set enrichment typically benefits from information from both highly variable genes as well as less variable genes. But I am interested to hear what other have to say on the matter.

All this to say, you can reasonably use the raw counts (i.e. assay "RNA" and slot "counts") from your integrated object for AUCell input.

Thanks for your reply, jv!

Yes, apologies, I wrote this post in abit of a frantic rush. I was meant to say that I am looking at the integrated data object as I have already peformed clustering and UMAP(ing) and wanted to overlay the GSEA results on top of the UMAP. I did notice that sentence in the tutorial but was questioning the methodology - if I feed it the scaled data (3000HVGs for each cell), then it can only perform its independent procedure on those genes.

I was thinking the same thing, but a lot of people seem to from upon using raw data for certain analyses so thought it was a good idea to ask on here. Thanks for you input, I think I will try this and let you know how it goes!