Consider this plot:
How would you interpret it?
What does it show in terms of the biology of the model system? In terms of the research question?
Research question may involve specific genes, but is invariably rooted in biological systems. Functional analysis can bridge this gap.
Functional analysis (aka
enrichment analysis, pathway enrichment analysis, …) Functional analysis
identified patterns in your results and compares them to known
biological patterns.
Imagine we constructed this plot based on the volcano plot above:
!(Curiously simple enrichment plot of five reference gene sets)[]
Which pathways are enriched? Are they statistically significant?
Let’s consider a single reference gene set in a very small universe
Visualizing enrichment !(Venn diagrams)[] ((query set) background set) = background proportion ((query set) reference set) = reference proportion
Diagram with gene sets query set reference set
2x2 table significant in category proportions = a/(a+b) gs in background proportions = (a+c) / N fold enrichment = significant in gs / gs in background = observed / expected = 2.667 The pathway is ~2.67× overrepresented among your significant genes relative to the background.
1 enrichment, <1 depletion fisher’s exact test / hypergeometric p = 0.014
FOLD: odds ratio OR = odds a (gene is in gs | significant) / odds(gene in gs| not significant) OR = 5; sig genes 5 times more likely to be in gs than background genes >1 positive assocation (<1 negative association)
Let’s define gene set.
Now expand to many gene sets.
(And now with FDR correction)
Similar steps / Similar inputs / Similar outputs Our focus / path in this workshop An example bulk RNA-Seq experiment.
Note that we are focusing on gene expression, but functional analysis can be applied to many experiment designs and analytes including non-coding RNA, protein expression, metabolites, and DNA methylation.
Adding biological context can connect observed DE patterns with the research question.
Adding biological context can reduce complexity: Functional analysis starts with hundreds to thousands of genes and emits dozens of biological processes or functional modules. This can greatly simplify interpretation.
Aggregating the effects of individual genes into processes or functional modules reveals coordinated changes.
Functional analysis results are simpler to generalize and compare across individual DE experiments.