1 [Breakout Exercise 1] - Heatmap of top genes


15 Minutes


It can be useful to visualize to visualize how our samples organize along with the expression pattern of individual genes. We will create a heatmap of the top most variable genes using the pheatmap function.


1.0.1 Instructions:


  • One group member should share their screen in the breakout room. If nobody volunteers, a helper may randomly select someone.
  • The group members should discuss the exercise and work together to find a solution.
  • If there is time after a solution is found, allow all members to complete the exercise.


  • First, create a object named topVarGenes that stores the top 50 most variable genes from our rlog normalized data (note that each row of rld corresponds to a gene).
  • Work from the inside to the outside when stringing together commands and test out each part to ensure the commands you enter are having the expected behavior.

Hint:

The command rowVars(assay(rld)) will return the row variance values, the command base::order will return an index of the input re-arranged into accending or decending order, and the head command or index referencing can be used to select the firstor last part of several data types.


Click for solution

Helpers - it may be helpful to guide learners through testing the steps needed to select the top 50 most variable genes. The below example is one possible solution.

topVarGenes = rowVars(assay(rld))
head(topVarGenes)
## [1] 0.008608561 1.734191622 0.037923016 0.114943172 0.019207105 0.149559602
orderedVars = order(topVarGenes)
head(orderedVars)
## [1] 16161 18665 18695 18933 19009 19034
sortedVars = sort(topVarGenes)
head(sortedVars)
## [1] 0.0005250999 0.0008457552 0.0008457552 0.0008457552 0.0008457552 0.0008457552
tail(sortedVars)
## [1] 11.10653 14.23099 14.42485 14.67978 15.49938 15.74445
?order
orderedVars = order(topVarGenes, decreasing = TRUE)
head(topVarGenes[orderedVars])
## [1] 15.74445 15.49938 14.67978 14.42485 14.23099 11.10653
orderedMat = assay(rld)[orderedVars, ]
head(orderedMat)
##                    Sample_116498 Sample_116499 Sample_116500 Sample_116501 Sample_116502
## ENSMUSG00000069917      7.386969      7.592002      7.421841      7.658291      9.096951
## ENSMUSG00000052305      7.545034      7.866700      7.553335      7.999982      9.158207
## ENSMUSG00000073940      5.660694      5.948358      5.642725      6.041644      6.784668
## ENSMUSG00000006574      5.924567      6.076577      6.030227      5.926847      7.300111
## ENSMUSG00000069919      6.208442      6.407137      6.469322      6.382475      7.472736
## ENSMUSG00000032496      4.325376      4.451341      4.396189      4.708752      5.339877
##                    Sample_116503 Sample_116504 Sample_116505 Sample_116506 Sample_116507
## ENSMUSG00000069917      8.744320      16.24134      17.75598      17.41340     10.595285
## ENSMUSG00000052305      8.866311      16.38642      17.87966      17.49616     10.639776
## ENSMUSG00000073940      6.828258      14.15300      15.68239      15.25716      8.527179
## ENSMUSG00000006574      6.957638      14.03874      15.81059      15.51719      8.937274
## ENSMUSG00000069919      7.135661      14.57034      15.92302      15.70450      9.120502
## ENSMUSG00000032496      4.540770      11.33823      12.53432      12.74284      6.293183
##                    Sample_116508 Sample_116509
## ENSMUSG00000069917      9.845426     11.396878
## ENSMUSG00000052305     10.043313     11.651572
## ENSMUSG00000073940      8.288975      9.536873
## ENSMUSG00000006574      8.833479     10.522213
## ENSMUSG00000069919      8.615545     10.586671
## ENSMUSG00000032496      5.519366      8.619501
# 
orderedMat = orderedMat[1:50, ]

## Alternatively
orderedMat = head(orderedMat, 50)


  • Next, use pheatmap to plot the selected topVarGenes from rld. If you have time, try to scale by row to enhance any clusters of genes with similar expression across samples.


Hint:

We can use assay(rld) to access the matrix of normalized data and we can use bracket notation ([rows,]) and the list of rows stored in topVarGene to only provide data for the 50 genes of interest.


Click for solution
# plot with no arguments
pheatmap(orderedMat)

# plots with row scaled and assigned to 'p' to make writing to file easier
p <- pheatmap(assay(rld)[topVarGenes,], scale = "row")


Note, as mentioned in this Biostars post, scaling by row in pheatmap will automatically scale the data to Z-scores.


Finally, to output the plot to file, use the pdf() function:

pdf(file = paste0(plotPath, 'Top_heatmap_rlog_', Comparison, '.pdf'), onefile = TRUE)
p
dev.off()
## quartz_off_screen 
##                 2
LS0tCnRpdGxlOiAiRGF5IDMgLSBNb2R1bGUgMDkgSGVhdG1hcCIKYXV0aG9yOiAiVU0gQmlvaW5mb3JtYXRpY3MgQ29yZSIKZGF0ZTogImByIFN5cy5EYXRlKClgIgpvdXRwdXQ6CiAgICAgICAgaHRtbF9kb2N1bWVudDoKICAgICAgICAgICAgaW5jbHVkZXM6CiAgICAgICAgICAgICAgICBpbl9oZWFkZXI6IGhlYWRlci5odG1sCiAgICAgICAgICAgIHRoZW1lOiBwYXBlcgogICAgICAgICAgICB0b2M6IHRydWUKICAgICAgICAgICAgdG9jX2RlcHRoOiA0CiAgICAgICAgICAgIHRvY19mbG9hdDogdHJ1ZQogICAgICAgICAgICBudW1iZXJfc2VjdGlvbnM6IHRydWUKICAgICAgICAgICAgZmlnX2NhcHRpb246IHRydWUKICAgICAgICAgICAgbWFya2Rvd246IEdGTQogICAgICAgICAgICBjb2RlX2Rvd25sb2FkOiB0cnVlCi0tLQo8c3R5bGUgdHlwZT0idGV4dC9jc3MiPgpib2R5eyAvKiBOb3JtYWwgICovCiAgICAgIGZvbnQtc2l6ZTogMTRwdDsKICB9CnByZSB7CiAgZm9udC1zaXplOiAxMnB0Cn0KY29kZS5yewogIGZvbnQtc2l6ZTogMTJwdDsKfQo8L3N0eWxlPgoKPCEtLS0gQWxsb3cgdGhlIHBhZ2UgdG8gYmUgd2lkZXIgLS0tPgo8c3R5bGU+CiAgICBib2R5IC5tYWluLWNvbnRhaW5lciB7CiAgICAgICAgbWF4LXdpZHRoOiAxMjAwcHg7CiAgICB9Cjwvc3R5bGU+Cgo8YnI+CgojIFtCcmVha291dCBFeGVyY2lzZSAxXSAtIEhlYXRtYXAgb2YgdG9wIGdlbmVzCgo8YnI+CgoKYGBge3IgTW9kdWxlcywgZXZhbD1UUlVFLCBlY2hvPUZBTFNFLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQpsaWJyYXJ5KERFU2VxMikKbGlicmFyeShnZ3Bsb3QyKQpsaWJyYXJ5KHRpZHlyKQpsaWJyYXJ5KGRwbHlyKQpsaWJyYXJ5KG1hdHJpeFN0YXRzKQpsaWJyYXJ5KGdncmVwZWwpCmxpYnJhcnkocGhlYXRtYXApCmxpYnJhcnkoUkNvbG9yQnJld2VyKQpsb2FkKCJyZGF0YS9SdW5uaW5nRGF0YS5SRGF0YSIpCmBgYAoKCioqMTUgTWludXRlcyoqCgo8YnI+CgpJdCBjYW4gYmUgdXNlZnVsIHRvIHZpc3VhbGl6ZSB0byB2aXN1YWxpemUgaG93IG91ciBzYW1wbGVzIG9yZ2FuaXplIGFsb25nIHdpdGggdGhlIGV4cHJlc3Npb24gcGF0dGVybiBvZiBpbmRpdmlkdWFsIGdlbmVzLiBXZSB3aWxsIGNyZWF0ZSBhIGhlYXRtYXAgb2YgdGhlIHRvcCBtb3N0IHZhcmlhYmxlIGdlbmVzIHVzaW5nIHRoZSBgcGhlYXRtYXBgIGZ1bmN0aW9uLgoKPGJyPgoKIyMjIEluc3RydWN0aW9uczoKCjxicj4KCi0gT25lIGdyb3VwIG1lbWJlciBzaG91bGQgc2hhcmUgdGhlaXIgc2NyZWVuIGluIHRoZSBicmVha291dCByb29tLiBJZiBub2JvZHkgdm9sdW50ZWVycywgYSBoZWxwZXIgbWF5IHJhbmRvbWx5IHNlbGVjdCBzb21lb25lLgotIFRoZSBncm91cCBtZW1iZXJzIHNob3VsZCBkaXNjdXNzIHRoZSBleGVyY2lzZSBhbmQgd29yayB0b2dldGhlciB0byBmaW5kIGEgc29sdXRpb24uCi0gSWYgdGhlcmUgaXMgdGltZSBhZnRlciBhIHNvbHV0aW9uIGlzIGZvdW5kLCBhbGxvdyBhbGwgbWVtYmVycyB0byBjb21wbGV0ZSB0aGUgZXhlcmNpc2UuCgo8YnI+CgotIEZpcnN0LCBjcmVhdGUgYSBvYmplY3QgbmFtZWQgYHRvcFZhckdlbmVzYCB0aGF0IHN0b3JlcyB0aGUgdG9wICoqNTAqKiBtb3N0IHZhcmlhYmxlIGdlbmVzIGZyb20gb3VyIHJsb2cgbm9ybWFsaXplZCBkYXRhIChub3RlIHRoYXQgZWFjaCByb3cgb2YgYHJsZGAgY29ycmVzcG9uZHMgdG8gYSBnZW5lKS4KLSBXb3JrIGZyb20gdGhlIGluc2lkZSB0byB0aGUgb3V0c2lkZSB3aGVuIHN0cmluZ2luZyB0b2dldGhlciBjb21tYW5kcyBhbmQgdGVzdCBvdXQgZWFjaCBwYXJ0IHRvIGVuc3VyZSB0aGUgY29tbWFuZHMgeW91IGVudGVyIGFyZSBoYXZpbmcgdGhlIGV4cGVjdGVkIGJlaGF2aW9yLgoKPiBIaW50Ogo+IAo+IFRoZSBjb21tYW5kIGByb3dWYXJzKGFzc2F5KHJsZCkpYCB3aWxsIHJldHVybiB0aGUgcm93IHZhcmlhbmNlIHZhbHVlcywgdGhlIGNvbW1hbmQgYGJhc2U6Om9yZGVyYCB3aWxsIHJldHVybiBhbiBpbmRleCBvZiB0aGUgaW5wdXQgcmUtYXJyYW5nZWQgaW50byBhY2NlbmRpbmcgb3IgZGVjZW5kaW5nIG9yZGVyLCBhbmQgdGhlIGBoZWFkYCBjb21tYW5kIG9yIGluZGV4IHJlZmVyZW5jaW5nIGNhbiBiZSB1c2VkIHRvIHNlbGVjdCB0aGUgZmlyc3RvciBsYXN0IHBhcnQgb2Ygc2V2ZXJhbCBkYXRhIHR5cGVzLiAKCgo8YnI+Cgo8ZGV0YWlscz4KICAgIDxzdW1tYXJ5PipDbGljayBmb3Igc29sdXRpb24qPC9zdW1tYXJ5PgogICAgSGVscGVycyAtIGl0IG1heSBiZSBoZWxwZnVsIHRvIGd1aWRlIGxlYXJuZXJzIHRocm91Z2ggdGVzdGluZyB0aGUgc3RlcHMgbmVlZGVkIHRvIHNlbGVjdCB0aGUgdG9wIDUwIG1vc3QgdmFyaWFibGUgZ2VuZXMuIFRoZSBiZWxvdyBleGFtcGxlIGlzIG9uZSBwb3NzaWJsZSBzb2x1dGlvbi4KYGBge3IgVGVzdFNhbXBsZUhlYXRtYXB9CnRvcFZhckdlbmVzID0gcm93VmFycyhhc3NheShybGQpKQpoZWFkKHRvcFZhckdlbmVzKQoKb3JkZXJlZFZhcnMgPSBvcmRlcih0b3BWYXJHZW5lcykKaGVhZChvcmRlcmVkVmFycykKCnNvcnRlZFZhcnMgPSBzb3J0KHRvcFZhckdlbmVzKQpoZWFkKHNvcnRlZFZhcnMpCnRhaWwoc29ydGVkVmFycykKCj9vcmRlcgpvcmRlcmVkVmFycyA9IG9yZGVyKHRvcFZhckdlbmVzLCBkZWNyZWFzaW5nID0gVFJVRSkKaGVhZCh0b3BWYXJHZW5lc1tvcmRlcmVkVmFyc10pCgpvcmRlcmVkTWF0ID0gYXNzYXkocmxkKVtvcmRlcmVkVmFycywgXQpoZWFkKG9yZGVyZWRNYXQpCgojIApvcmRlcmVkTWF0ID0gb3JkZXJlZE1hdFsxOjUwLCBdCgojIyBBbHRlcm5hdGl2ZWx5Cm9yZGVyZWRNYXQgPSBoZWFkKG9yZGVyZWRNYXQsIDUwKQoKYGBgCjwvZGV0YWlscz4KCjxicj4KCgotIE5leHQsIHVzZSBgcGhlYXRtYXBgIHRvIHBsb3QgdGhlIHNlbGVjdGVkIGB0b3BWYXJHZW5lc2AgZnJvbSBgcmxkYC4gSWYgeW91IGhhdmUgdGltZSwgdHJ5IHRvIHNjYWxlIGJ5IHJvdyB0byBlbmhhbmNlIGFueSBjbHVzdGVycyBvZiBnZW5lcyB3aXRoIHNpbWlsYXIgZXhwcmVzc2lvbiBhY3Jvc3Mgc2FtcGxlcy4gIAoKPGJyPiAKCj4gSGludDoKPgo+IFdlIGNhbiB1c2UgYGFzc2F5KHJsZClgIHRvIGFjY2VzcyB0aGUgbWF0cml4IG9mIG5vcm1hbGl6ZWQgZGF0YSBhbmQgd2UgY2FuIHVzZSBicmFja2V0IG5vdGF0aW9uIChbcm93cyxdKSBhbmQgdGhlIGxpc3Qgb2Ygcm93cyBzdG9yZWQgaW4gYHRvcFZhckdlbmVgIHRvIG9ubHkgcHJvdmlkZSBkYXRhIGZvciB0aGUgNTAgZ2VuZXMgb2YgaW50ZXJlc3QuCj4gCgo8YnI+Cgo8ZGV0YWlscz4KICAgIDxzdW1tYXJ5PipDbGljayBmb3Igc29sdXRpb24qPC9zdW1tYXJ5PgpgYGB7ciBUZXN0UGhlYXRtYXB9CiMgcGxvdCB3aXRoIG5vIGFyZ3VtZW50cwpwaGVhdG1hcChvcmRlcmVkTWF0KQoKIyBwbG90cyB3aXRoIHJvdyBzY2FsZWQgYW5kIGFzc2lnbmVkIHRvICdwJyB0byBtYWtlIHdyaXRpbmcgdG8gZmlsZSBlYXNpZXIKcCA8LSBwaGVhdG1hcChhc3NheShybGQpW3RvcFZhckdlbmVzLF0sIHNjYWxlID0gInJvdyIpCmBgYAo8L2RldGFpbHM+Cgo8YnI+CgoqTm90ZSwgYXMgbWVudGlvbmVkIGluIHRoaXMgW0Jpb3N0YXJzIHBvc3RdKGh0dHBzOi8vc3VwcG9ydC5iaW9jb25kdWN0b3Iub3JnL3AvMTMwMzM2LyksIHNjYWxpbmcgYnkgcm93IGluIGBwaGVhdG1hcGAgd2lsbCBhdXRvbWF0aWNhbGx5IHNjYWxlIHRoZSBkYXRhIHRvIFotc2NvcmVzLioKCjxicj4KCkZpbmFsbHksIHRvICoqb3V0cHV0KiogdGhlIHBsb3QgdG8gZmlsZSwgdXNlIHRoZSBgcGRmKClgIGZ1bmN0aW9uOgpgYGB7ciBFeGVyY2lzZU91dHB1dH0KcGRmKGZpbGUgPSBwYXN0ZTAocGxvdFBhdGgsICdUb3BfaGVhdG1hcF9ybG9nXycsIENvbXBhcmlzb24sICcucGRmJyksIG9uZWZpbGUgPSBUUlVFKQpwCmRldi5vZmYoKQpgYGAKCgoKICAKYGBge3IgV3JpdGVPdXQuUkRhdGEsIGV2YWw9RkFMU0UsIGVjaG89RkFMU0UsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9CiNIaWRkZW4gY29kZSBibG9jayB0byB3cml0ZSBvdXQgZGF0YSBmb3Iga25pdHRpbmcKc2F2ZS5pbWFnZShmaWxlID0gInJkYXRhL1J1bm5pbmdEYXRhLlJEYXRhIikKYGBgCgoK