You can run RStudio and many bioinformatics R libraries using University of Michigan’s Great Lakes High Performance Cluster (HPC). By the end of this guide you will be able to:
You can run many bioinformatic analyses locally on your workstation. However, running these on Great Lakes is a nice approach because:
That said, using Great Lakes is different than using your laptop/workstation. A few key ideas:
If you have problems/questions, please don’t hesitate to email us at: bioinformatics-workshops@umich.edu
When emailing it will speed things along if you could include:
You need a UM core-imaged workstation and a web browser.
You need to be on UM campus or connected to the UM VPN or Michigan Medicine VPN using Cisco Secure Client. (Setting this up will also require Duo 2-factor authentication.)
You need a user account on Great Lakes. The first time, you need to request this from ARC; it can take 1-2 days to get set up:
You will need enough storage space. How much depends on the experimental design and what analyses you want to do. Functional analyses don’t take up much disk space, so 2GB is plenty for the workshop activities.
As a new user, you can execute this tutorial from your home dir. As
your experiments and analyses grow, you can check your available space
from the command line (home-quota
) and request more storage
from ARC as necessary.
Once you’ve got everything you need from above, you’re going to use Open OnDemand (OOD) to launch a RStudio session.
If off-campus, connect to the VPN using Cisco Secure Client.
In your browser, go to Great Lakes Open On Demand:
greatlakes.arc-ts.umich.edu and login with your
uniqname and password.
In the menu at the top of the screen, click Interactive
Apps and select RStudio.
The previous step will display a launch configuration
page with several fields, e.g. …
Enter launch configuration values below in the corresponding fields:
Field | Value |
---|---|
R Version | Rtidyverse/4.4.0 |
RStudio version | RStudio/2024.04.1 |
Slurm account | (This is in the email from ARC) |
Partition | standard |
Number of hours | 4 (Enough to get started, adjust if you need more time) |
Number of cores | 4 |
Memory (GB) | 16 (Enough to get started, ok to boost if you ever run out) |
Module commands | load Bioinformatics Rutils-BFX |
After you’ve entered the values above, at the bottom of this page
click Launch. (Conveniently, the values you entered
above will now be your default for launching an RStudio session.) The
screen will update to show Great Lakes is preparing your session.
When the session is ready (usually a few seconds later), the
screen will update:
This will open a new browser tab that contains your RStudio
session:
You may see a prompt asking permission to use the clipboard.
Click Allow.
If you need to set or reset the clipboard settings, you can click on the site information icon left of the browser’s URL:
Edge Browser | Chrome Browser |
---|---|
![]() |
![]() |
On Great Lakes, you launch a RStudio session on a SLURM compute node.
The inputs used in the workshop can be downloaded to this
session. In the RStudio window, click on the Terminal
tab. The tab will be blank with a prompt that looks something like this:
[YOUR_UNIQNAME@glXXXX ~]$
Paste the following block into the Terminal prompt: and hit Enter/Return. This will take a minute or two to download and unpack the inputs.
# download Seurat inputs --------------------------------------------------
mkdir -p intro_functional_analysis_workshop/IFUN_R
cd intro_functional_analysis_workshop/IFUN_R
# Use curl to download
# We'll use evironment variables to avoid extremely long command lines
source_url="https://umich-brcf-bioinf-workshop.s3.us-east-1.amazonaws.com"
source_file="IFUN/workshop_ifun_inputs-20250917.tgz"
curl -o workshop_ifun_inputs.tgz ${source_url}/${source_file}
# tar unpacks the tarball into directories
tar xzvf workshop_ifun_inputs.tgz
# Since we have unpacked the tarball, we can remove it
rm workshop_ifun_inputs.tgz
You can use ls
or tree
to show the contents
of the inputs dir:
tree inputs
inputs
├── bulk_de_results
│ └── de_deficient_vs_control_annotated.csv
└── single_cell_de_results
├── de_pseudo_pericyte_D21_vs_D7.csv
├── de_pseudo_pericyte_D7_vs_D0.csv
├── de_standard_pericyte_D21_vs_D7.csv
└── de_standard_pericyte_D7_vs_D0.csv
Now you can review the workshop lessons and execute analyses on these input data.
There are several ways to move files to or from Great Lakes.
Open OnDemand (OOD) lets you browse your files and move small files (e.g. scripts or plots) between your workstation and Great Lakes using your web browser.
1.1 In your workstation’s browser, open the OOD Dashboard. Along the top menu, click on Files. (Note that if you shrink the screen very small, the menu items will be hidden in a “hamburger”.) In the dropdown menu, click Home Directory.
1.2 OOD will
display the contents of your home directory. You can click on a
directory to see its contents.
1.3 To view a
plot graphic, you can click on the hamburger and then select
View. This will open the plot in a new browser tab.
1.4 You can
download one or more files by selecting their checkboxes and clicking
the Download button. (Note: if you select a directory
and click Download, OOD will download the contents as a
single zipped file.)
To process RNA-Seq FASTQs or run CellRanger or Seurat on your own files, we recommend transferring the FASTQ files or cellranger outputs to Great Lakes using Globus.
Globus is a fast, secure, and fault tolerant way to move files of any size.
Globus is much better than OOD for transferring larger files like FASTQ files, Cell Ranger outputs, or saved Seurat data objects.
Details on how to set up and use Globus are outside the scope of this guide, but we recommend these links:
For larger files/directories, we strongly recommend you use Globus.
That said, if you are more comfortable with command line tools, you can
transfer files using the secure copy command scp. scp
is a lot like cp
but it allows you to copy files across a network.
To transfer from your workstation to Great Lakes:
3.1 From you workstation terminal or command window, cd
into the directory that contains your data. 3.2 Adjust the
scp
command below to match the correct source directory and
uniqname and hit Enter/Return to execute.
# Copy the SOURCE_DIR dir contents from your workstation to Great Lakes home dir
# -r copies recursively
# -p preserves the file modification times
scp -pr SOURCE_DIR YOUR_UNIQNAME@greatlakes-xfer.arc-ts.umich.edu:
3.3 The first time you run this command, you may see a prompt like the following; type yes and hit Enter/Return to continue.
The authenticity of host '...' can't be established.
ECDSA key fingerprint is SHA256:....
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
The command will print a warning (e.g. Warning: Permanently added ‘SERVER_ADDRESS’ to the list of known hosts). This is fine.
3.4 When prompted, type your UM password followed by Enter/Return.
3.5 You can also transfer files from Great Lakes to your workstation.
From your workstation terminal or command window, adjust the
scp
command below to match the correct source file and
uniqname and hit Enter/Return to execute.
# Copy the SOURCE_FILE from Great Lakes to your current workstation dir
scp YOUR_UNIQNAME@greatlakes-xfer.arc-ts.umich.edu:PATH/TO/SOURCE_FILE .
3.6 When prompted, type your UM password followed by Enter/Return.
Many config values have to be specified when launching an interactive job from Open OnDemand (OOD). Different jobs require different configs, but quite often a type of job (e.g. functional analysis, bulk RNA-Seq, scRNA-Seq, Cell Ranger, etc.) will use the same values each time. OOD allows you to save launch templates to simplify starting common jobs.
If off-campus, connect to the VPN using Cisco Secure Client.
In your browser, go to Great Lakes Open On Demand:
greatlakes.arc-ts.umich.edu and login with your
uniqname and password.
In the menu at the top of the screen, click Interactive
Apps and select RStudio.
Enter launch configuration values below in the corresponding fields:
Field | Value |
---|---|
R Version | Rtidyverse/4.4.0 |
RStudio version | RStudio/2024.04.1 |
Slurm account | (This is in the email from ARC) |
Partition | standard |
Number of hours | 4 (Enough to get started, adjust if you need more time) |
Number of cores | 4 |
Memory (GB) | 16 (Enough to get started, ok to boost if you ever run out) |
Module commands | load Bioinformatics Rutils-BFX |
Scroll to the bottom and click the checkbox Save
settings. This will show a dialog that let’s you name this
template. Enter RStudio-Functional-Analysis and click
Save.
Now click Save settings and close.
Saved templates are listed at the top of My Interactive Sessions