workshop wayfinder

Objectives

Introduce “point and click” option for running functional enrichment
Run over representation analysis (ORA) using WebGestalt’s browser interface
Understand results and outputs for ORA on bulk RNA-seq data

Getting started with functional enrichments

We’ve discussed some of the motivations and general types of approaches for performing functional enrichment analysis, but what tools can we use to perform these kinds of analyses?

While there are many tools available, WebGestalt (WEB-based GEne SeT AnaLysis Toolkit) is an approachable option since it includes a web-based interface that doesn’t require any programming knowledge. It also offers several methods for enrichment analysis, can run enrichments for data from a range of organisms, and the authors have recently updated the tool including expanding what analytes are supported.

We’ll start by reviewing the WebGestalt’s browser interface, focusing on RNA as our analyte and familiarizing ourselves with the options available via the browser interface before submitting our own functional enrichment.

WebGestalt interface

If we navigate to the WebGestalt homepage, we can see that it has several sections. At the top, is the main navigation menu, which includes links including to the Manual, Citation, a User Forum, and a link to the 2019 version of the tool.

WebGestalt homepage overview

Then, the main section is the left side of the “Basic parameters” box, which includes prompts for:

“Method of Interest” - allows selection of which approach to use for functional enrichment, like over-representation Analysis.
“Organism of Interest” - selection includes humans, mouse, rat, plus other several other model organisms.
“Functional Database” - allows selection of functional/biological knowledge database that will be compared to the input data.

There is also an area on the right where example inputs are provided for the supported analytes/functional database combinations, which can be useful to understanding what format or other attributes for the inputs are required to run the tool. Since we’ll be running the tool together, we’ll skip the example inputs section for now.

WebGestalt homepage overview

If we scroll down slightly then we can see a box for providing inputs to WebGestalt, which includes prompts on the left side of the page for:

“Analyte Type” - allows selection of what was measured in the experiment; for the workshop we’ll only be using data from experiments that fall into the Gene/Protein category.
“Upload ID List” - option to upload a file with input genes that will be queried.
“Input ID List” - option to paste in list of input genes that will be queried.

On right side are prompts for:

“Select Reference Set” - options to select general reference/background gene set options.
“ID type for uploaded reference list” - allows select of ID type for a user provided reference/background gene set.
“Upload User Reference Set Fle” - allows users to upload a file containing a custom reference/background gene set.

Below that, there is a area labeled “Advanced parameters”, that allows some changes to the default options multiple hypothesis correction method and significance cutoff, but we’ll also skip that section for now.

WebGestalt browser demonstration

Together we’ll walk through the steps to run an over representation analysis (ORA) for bulk RNA-seq results from our RNA-seq demystified workshop.

Input data

The comparision between deficient vs control mice using DESeq2 generated statistics for each gene; the table of results (de_deficient_vs_control_annotated.csv) had the following columns:

id: The ENSEMBL gene identifier.
symbol: The gene symbol.
baseMean: The average expression of the gene across all samples.
log2FoldChange: The log2 fold change in expression between the deficient and control samples.
lfcSE: The standard error of the log2 fold change.
stat: The test statistic for the differential expression test.
pvalue: The p-value for the differential expression test.

A key attribute of the output table from the original analysis is that includes statistics for all genes included in the comparison, not just those that are differentially expressed.

Behind the scenes, we used the same thresholds as the original analysis for the log2FoldChange and pvalue columns to identify DE gene. Then we created a list of the IDs for the DE genes, using the gene column, and output that list to file. We’ll re-create this file together in the next section, but for simplicity we’ll accept this pre-made file of DE genes as an input to use for the web browser version of WebGestalt.

Running WebGestalt with our bulk RNA-seq results

WebGestalt basic parameters
First, we’ll navigate back to the top section of the Basic parameters section of the WebGestalt browser interface. The default Method of Interest is “Over-Representation Analysis” which also happens to be the type of functional enrichment we want to run right now, so we’ll keep that default option.