Aligning All Samples Exercise (Breakout):
15 Minutes
We just learned about how to use RSEM & STAR, but now we need to
align all of the rest of our samples to the reference genome. We’ll use
the concepts we’ve learned earlier in this breakout exercise.
Instructions:
- One group member should share their screen in the breakout room. If
nobody volunteers, a helper may randomly select someone.
- The group members should discuss the exercise and work together to
find a solution.
- After a solution is found, allow time for all members to complete
the exercise.
- Review what we learned about running RSEM + STAR, to construct an
appropriate command for aligning one of our samples.
- Use a bash variable in our alignment command to quickly and easily
align samples 2-4.
- View the output, and verify that we have the files we need.
Solution - Aligning All Samples Exercise
One solution is to define a bash variable for the sample, use that
variable in the alignment command, and then redefine the variable before
repeating the command for each change .
# Define a variable $SAMPLE
SAMPLE=SRR7777896
rsem-calculate-expression --star --num-threads 1 --star-gzipped-read-file \
--star-output-genome-bam --keep-intermediate-files \
out_trimmed/${SAMPLE}_R1.trimmed.fastq.gz \
../refs/GRCm38.102.chr19reduced \
out_rsem/${SAMPLE}
SAMPLE=SRR7777897
# Use the up arrow key to repeat the same command as above with the variable reassigned to sample_03
SAMPLE=SRR7777898
# Use the up arrow key to repeat the same command as above with the variable reassigned to sample_04
SAMPLE=SRR7777899
# Use the up arrow key to repeat the same command as above with the variable reassigned to sample_04
SAMPLE=SRR7777900
# Use the up arrow key to repeat the same command as above with the variable reassigned to sample_04
Another solution is to create a for-loop with our bash variable and
alignment command. E.g.
for SAMPLE in SRR7777896 SRR7777897 SRR7777898 SRR7777899 SRR7777900
do
rsem-calculate-expression --star --num-threads 1 --star-gzipped-read-file \
--star-output-genome-bam --keep-intermediate-files \
out_trimmed/${SAMPLE}_R1.trimmed.fastq.gz \
../refs/GRCm38.102.chr19reduced \
out_rsem/${SAMPLE}
done
Helper Hint: If suggesting a for-loop approach, it can be helpful to
build up a “dry-run” command as a test case, to get learners to be more
cognizant of what their code will do. Echoing filenames first might be a
good suggestion.
LS0tCnRpdGxlOiAiRGF5IDEgLSBCcmVha291dCAwMiIKYXV0aG9yOiAiVU0gQmlvaW5mb3JtYXRpY3MgQ29yZSIKb3V0cHV0OgogICAgICAgIGh0bWxfZG9jdW1lbnQ6CiAgICAgICAgICAgIGluY2x1ZGVzOgogICAgICAgICAgICAgICAgaW5faGVhZGVyOiBoZWFkZXIuaHRtbAogICAgICAgICAgICB0aGVtZTogcGFwZXIKICAgICAgICAgICAgZmlnX2NhcHRpb246IHRydWUKICAgICAgICAgICAgbWFya2Rvd246IEdGTQogICAgICAgICAgICBjb2RlX2Rvd25sb2FkOiB0cnVlCi0tLQo8c3R5bGUgdHlwZT0idGV4dC9jc3MiPgpib2R5eyAvKiBOb3JtYWwgICovCiAgICAgIGZvbnQtc2l6ZTogMTRwdDsKICB9CnByZSB7CiAgZm9udC1zaXplOiAxMnB0Cn0KPC9zdHlsZT4KCjxicj4KCiMjIEFsaWduaW5nIEFsbCBTYW1wbGVzIEV4ZXJjaXNlIChCcmVha291dCk6Cgo8YnI+CgoqKjE1IE1pbnV0ZXMqKgoKPGJyPgoKV2UganVzdCBsZWFybmVkIGFib3V0IGhvdyB0byB1c2UgUlNFTSAmIFNUQVIsIGJ1dCBub3cgd2UgbmVlZCB0byBhbGlnbiBhbGwgb2YgdGhlIHJlc3Qgb2Ygb3VyIHNhbXBsZXMgdG8gdGhlIHJlZmVyZW5jZSBnZW5vbWUuIFdlJ2xsIHVzZSB0aGUgY29uY2VwdHMgd2UndmUgbGVhcm5lZCBlYXJsaWVyIGluIHRoaXMgYnJlYWtvdXQgZXhlcmNpc2UuCgo8YnI+CgojIyMgSW5zdHJ1Y3Rpb25zOgoKPGJyPgoKLSBPbmUgZ3JvdXAgbWVtYmVyIHNob3VsZCBzaGFyZSB0aGVpciBzY3JlZW4gaW4gdGhlIGJyZWFrb3V0IHJvb20uIElmIG5vYm9keSB2b2x1bnRlZXJzLCBhIGhlbHBlciBtYXkgcmFuZG9tbHkgc2VsZWN0IHNvbWVvbmUuCi0gVGhlIGdyb3VwIG1lbWJlcnMgc2hvdWxkIGRpc2N1c3MgdGhlIGV4ZXJjaXNlIGFuZCB3b3JrIHRvZ2V0aGVyIHRvIGZpbmQgYSBzb2x1dGlvbi4KLSBBZnRlciBhIHNvbHV0aW9uIGlzIGZvdW5kLCBhbGxvdyB0aW1lIGZvciBhbGwgbWVtYmVycyB0byBjb21wbGV0ZSB0aGUgZXhlcmNpc2UuCgo8YnI+CgotIFJldmlldyB3aGF0IHdlIGxlYXJuZWQgYWJvdXQgcnVubmluZyBSU0VNICsgU1RBUiwgdG8gY29uc3RydWN0IGFuIGFwcHJvcHJpYXRlIGNvbW1hbmQgZm9yIGFsaWduaW5nIG9uZSBvZiBvdXIgc2FtcGxlcy4KLSBVc2UgYSBiYXNoIHZhcmlhYmxlIGluIG91ciBhbGlnbm1lbnQgY29tbWFuZCB0byBxdWlja2x5IGFuZCBlYXNpbHkgYWxpZ24gc2FtcGxlcyAyLTQuCi0gVmlldyB0aGUgb3V0cHV0LCBhbmQgdmVyaWZ5IHRoYXQgd2UgaGF2ZSB0aGUgZmlsZXMgd2UgbmVlZC4KCjxicj4KCiMjIyBTb2x1dGlvbiAtIEFsaWduaW5nIEFsbCBTYW1wbGVzIEV4ZXJjaXNlCgo8YnI+CgpPbmUgc29sdXRpb24gaXMgdG8gZGVmaW5lIGEgYmFzaCB2YXJpYWJsZSBmb3IgdGhlIHNhbXBsZSwgdXNlIHRoYXQgdmFyaWFibGUgaW4gdGhlIGFsaWdubWVudCBjb21tYW5kLCBhbmQgdGhlbiByZWRlZmluZSB0aGUgdmFyaWFibGUgYmVmb3JlIHJlcGVhdGluZyB0aGUgY29tbWFuZCBmb3IgZWFjaCBjaGFuZ2UgLgoKICAgICMgRGVmaW5lIGEgdmFyaWFibGUgJFNBTVBMRQogICAgU0FNUExFPVNSUjc3Nzc4OTYKICAgIHJzZW0tY2FsY3VsYXRlLWV4cHJlc3Npb24gLS1zdGFyIC0tbnVtLXRocmVhZHMgMSAtLXN0YXItZ3ppcHBlZC1yZWFkLWZpbGUgXAogICAgLS1zdGFyLW91dHB1dC1nZW5vbWUtYmFtIC0ta2VlcC1pbnRlcm1lZGlhdGUtZmlsZXMgXAogICAgb3V0X3RyaW1tZWQvJHtTQU1QTEV9X1IxLnRyaW1tZWQuZmFzdHEuZ3ogXAogICAgLi4vcmVmcy9HUkNtMzguMTAyLmNocjE5cmVkdWNlZCBcCiAgICBvdXRfcnNlbS8ke1NBTVBMRX0KICAgIFNBTVBMRT1TUlI3Nzc3ODk3CiAgICAjIFVzZSB0aGUgdXAgYXJyb3cga2V5IHRvIHJlcGVhdCB0aGUgc2FtZSBjb21tYW5kIGFzIGFib3ZlIHdpdGggdGhlIHZhcmlhYmxlIHJlYXNzaWduZWQgdG8gc2FtcGxlXzAzCiAgICBTQU1QTEU9U1JSNzc3Nzg5OAogICAgIyBVc2UgdGhlIHVwIGFycm93IGtleSB0byByZXBlYXQgdGhlIHNhbWUgY29tbWFuZCBhcyBhYm92ZSB3aXRoIHRoZSB2YXJpYWJsZSByZWFzc2lnbmVkIHRvIHNhbXBsZV8wNAogICAgU0FNUExFPVNSUjc3Nzc4OTkKICAgICMgVXNlIHRoZSB1cCBhcnJvdyBrZXkgdG8gcmVwZWF0IHRoZSBzYW1lIGNvbW1hbmQgYXMgYWJvdmUgd2l0aCB0aGUgdmFyaWFibGUgcmVhc3NpZ25lZCB0byBzYW1wbGVfMDQKICAgIFNBTVBMRT1TUlI3Nzc3OTAwCiAgICAjIFVzZSB0aGUgdXAgYXJyb3cga2V5IHRvIHJlcGVhdCB0aGUgc2FtZSBjb21tYW5kIGFzIGFib3ZlIHdpdGggdGhlIHZhcmlhYmxlIHJlYXNzaWduZWQgdG8gc2FtcGxlXzA0Cgo8YnI+CgpBbm90aGVyIHNvbHV0aW9uIGlzIHRvIGNyZWF0ZSBhIGZvci1sb29wIHdpdGggb3VyIGJhc2ggdmFyaWFibGUgYW5kIGFsaWdubWVudCBjb21tYW5kLiBFLmcuCgogICAgZm9yIFNBTVBMRSBpbiBTUlI3Nzc3ODk2IFNSUjc3Nzc4OTcgU1JSNzc3Nzg5OCBTUlI3Nzc3ODk5IFNSUjc3Nzc5MDAKICAgIGRvCiAgICAgICAgcnNlbS1jYWxjdWxhdGUtZXhwcmVzc2lvbiAtLXN0YXIgLS1udW0tdGhyZWFkcyAxIC0tc3Rhci1nemlwcGVkLXJlYWQtZmlsZSBcCiAgICAgICAgLS1zdGFyLW91dHB1dC1nZW5vbWUtYmFtIC0ta2VlcC1pbnRlcm1lZGlhdGUtZmlsZXMgXAogICAgICAgIG91dF90cmltbWVkLyR7U0FNUExFfV9SMS50cmltbWVkLmZhc3RxLmd6IFwKICAgICAgICAuLi9yZWZzL0dSQ20zOC4xMDIuY2hyMTlyZWR1Y2VkIFwKICAgICAgICBvdXRfcnNlbS8ke1NBTVBMRX0KICAgIGRvbmUKCjxicj4KCj4gSGVscGVyIEhpbnQ6IElmIHN1Z2dlc3RpbmcgYSBmb3ItbG9vcCBhcHByb2FjaCwgaXQgY2FuIGJlIGhlbHBmdWwgdG8gYnVpbGQgdXAgYSAiZHJ5LXJ1biIgY29tbWFuZCBhcyBhIHRlc3QgY2FzZSwgdG8gZ2V0IGxlYXJuZXJzIHRvIGJlIG1vcmUgY29nbml6YW50IG9mIHdoYXQgdGhlaXIgY29kZSB3aWxsIGRvLiBFY2hvaW5nIGZpbGVuYW1lcyBmaXJzdCBtaWdodCBiZSBhIGdvb2Qgc3VnZ2VzdGlvbi4KCjxicj4K