15 Minutes
GeneKey
object to
the DE results table res_WT
and store the new table in an
object named res_WT_anno
.Hint:
Look at the documentation for either the
merge
function or the tidyversejoin
function for ideas on how to combine theGeneKey
andres_WT
tables
## check the biomaRt key we created to remember the structure
head(GeneKey)
## ensembl_gene_id external_gene_name
## 1 ENSMUSG00000000001 Gnai3
## 2 ENSMUSG00000000028 Cdc45
## 3 ENSMUSG00000000031 H19
## 4 ENSMUSG00000000037 Scml2
## 5 ENSMUSG00000000049 Apoh
## 6 ENSMUSG00000000056 Narf
First, create a new table called res_WT_anno
that
includes a column with the ENSEMBL ids named genes
using
the mutate
function. Then use the left_join
function to combine the GeneKey
table with the
res_WT
DE results.
res_WT_anno <- as.data.frame(res_WT) %>%
mutate(genes = row.names(res_WT)) %>%
left_join(GeneKey, by =c("genes" = "ensembl_gene_id")) %>%
relocate(c("genes", "external_gene_name")) # optionally, re-order columns to make output more readable
head(res_WT_anno)
## genes external_gene_name baseMean log2FoldChange lfcSE stat
## 1 ENSMUSG00000000001 Gnai3 6255.632164 -0.014024321 0.09301588 -0.15077341
## 2 ENSMUSG00000000028 Cdc45 1337.874474 0.522421732 0.13599465 3.84148730
## 3 ENSMUSG00000000031 H19 3.773571 -1.156597290 1.60080258 -0.72251088
## 4 ENSMUSG00000000037 Scml2 27.563275 -0.279611867 0.47655119 -0.58674046
## 5 ENSMUSG00000000049 Apoh 2.256350 4.010718329 1.88912289 2.12305846
## 6 ENSMUSG00000000056 Narf 2194.251314 -0.008544091 0.17722204 -0.04821122
## pvalue padj
## 1 0.8801544640 0.960702559
## 2 0.0001222911 0.003948693
## 3 0.4699804359 NA
## 4 0.5573780290 0.804647138
## 5 0.0337489531 NA
## 6 0.9615479086 0.987888734
Alternatively, if you are more familiar with base functions:
res_WT_anno <- res_WT # copy table
res_WT_anno <- cbind(genes=row.names(res_WT_anno), res_WT_anno[ ,c(1:6)])
res_WT_anno <- as.data.frame(res_WT_anno)
# combine the two tables using the merge function (similar to join from `tidyverse`)
res_WT_anno <- merge(GeneKey, res_WT_anno, by.x = "ensembl_gene_id", by.y="genes", all.x = FALSE, all.y = TRUE)
head(res_WT_anno)
## ensembl_gene_id external_gene_name baseMean log2FoldChange lfcSE stat
## 1 ENSMUSG00000000001 Gnai3 6255.632164 -0.014024321 0.09301588 -0.15077341
## 2 ENSMUSG00000000028 Cdc45 1337.874474 0.522421732 0.13599465 3.84148730
## 3 ENSMUSG00000000031 H19 3.773571 -1.156597290 1.60080258 -0.72251088
## 4 ENSMUSG00000000037 Scml2 27.563275 -0.279611867 0.47655119 -0.58674046
## 5 ENSMUSG00000000049 Apoh 2.256350 4.010718329 1.88912289 2.12305846
## 6 ENSMUSG00000000056 Narf 2194.251314 -0.008544091 0.17722204 -0.04821122
## pvalue padj
## 1 0.8801544640 0.960702559
## 2 0.0001222911 0.003948693
## 3 0.4699804359 NA
## 4 0.5573780290 0.804647138
## 5 0.0337489531 NA
## 6 0.9615479086 0.987888734
Notice that not all genes were annotated with an ENSEMBl gene id or gene description. While we are able to annotate our results, this is a helpful reminder that gene symbol is often not a good unique identifier.