I am attempting to use the PCAtools R package. I have imported my own data matrix (pca.matrix) and metadata (metadata) files. When just running this code, everything works great and I get plots:
p <- pca(pca.matrix, removeVar = 0.1)
-- removing the lower 10% of variables based on variance
screeplot(p)
biplot(p)
When I try to link and check the metadata, all seems to be working:
pca.matrix <-pca.matrix[,which(colnames(pca.matrix) %in% rownames(metadata))]
all(colnames(pca.matrix) == rownames(metadata))
[1] TRUE
However, when I try to run the PCA with the metadata, I get the following:
p <- pca(pca.matrix, metadata = metadata, removeVar = 0.1)
Error in pca(pca.matrix, metadata = metadata, removeVar = 0.1) :
'colnames(mat)' is not identical to 'rownames(metadata)'
Shouldn't it be trying to match up 'colnames(pca.matrix)' with 'rownames(metadata)'?
What is 'colnames(mat)'? I feel like I'm totally missing some key information.
Any help would be great! Thank you!
Hi kevin,
I also having the similar issue,
counts_data <- read.csv('count_values_featurecounts.csv', row.names = 1)
head(counts_data)
colnames(counts_data)
colnames(counts_data)
[1] "Sample1_1387_S76_FE_1B" "sample2_1391_S80_FE_1B" "sample3_1388_S77_MM9_2B"
[4] "sample4_1389_S78_MM9_1B" "sample5_1390_S79_BIP_1C" "sample6_1392_S81_BIP_2B"
read in sample info
colData <- read.csv('metadata.csv', row.names = 1)
rownames(colData)
[1] "Sample1_1387_S76_FE_1B" "sample2_1391_S80_FE_1B" "sample3_1388_S77_MM9_2B"
[4] "sample4_1389_S78_MM9_1B" "sample5_1390_S79_BIP_1C" "sample6_1392_S81_BIP_2B"
making sure the row names in colData matches to column names in counts_data (smaller dataset %in% larger dataset)
all(rownames(colData) %in% colnames(counts_data))
[1] TRUE
are they in the same order?
all(colnames(counts_data1) == rownames(colData))
[1] TRUE
Step 2: construct a DESeqDataSet object ----------
dds <- DESeqDataSetFromMatrix(countData = counts_data,
colData = colData,
design = ~ treatment)
Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
some variables in design formula are characters, converting to factors
class: DESeqDataSet
dim: 5406 6
metadata(1): version
assays(1): counts
rownames(5406): R65_hybrid_00001 R65_hybrid_00002 ... R65_hybrid_05462_gene
R65_hybrid_05463_gene
rowData names(0):
colnames(6): Sample1_1387_S76_FE_1B sample2_1391_S80_FE_1B ... sample5_1390_S79_BIP_1C
sample6_1392_S81_BIP_2B
colData names(1): treatment
transform the data to variance-stabilised expression levels
vst <- vst(dds)
-- note: fitType='parametric', but the dispersion trend was not well captured by the
function: y = a/x + b, and a local regression fit was automatically substituted.
specify fitType='local' or 'mean' to avoid this message next time.
generate a PCA plot
p <- pca(vst,c("iron","mm9","bip") )
Error in t.default(mat) : argument is not a matrix
head(assay(vsd), 3)
Sample1_1387_S76_FE_1B sample2_1391_S80_FE_1B sample3_1388_S77_MM9_2B
R65_hybrid_00001 12.040811 11.832936 11.711007
R65_hybrid_00002 3.163354 1.903476 2.956567
R65_hybrid_00003 11.630781 11.263561 11.434657
sample4_1389_S78_MM9_1B sample5_1390_S79_BIP_1C sample6_1392_S81_BIP_2B
R65_hybrid_00001 11.079648 12.55149 13.050876
R65_hybrid_00002 3.186474 3.59161 3.289681
R65_hybrid_00003 10.113185 12.55919 12.039214
plotPCA(vsd, intgroup = c("iron","mm9","bip"))
using ntop=500 top features by variance
Error in .local(object, ...) :
the argument 'intgroup' should specify columns of colData(dds)
Please stop spamming comments to the answer field and put formatting. It's the 10101
button. Select code, then press the button to trigger markdown highlighting. This has nothing to do with the toplevel question. intgroup must be part of the colData of dds. So dds$iron
must exist and mm9 and bip must be levels of iron.