Cutadapt All Samples Exercise
15 Minutes
Now that we’ve learned the basics of running Cutadapt, we need to trim all the rest of our samples. If you remember from the Computational Foundations course, we learned about using bash variables. Let’s try an exercise where we use a bash variable to trim each one of our FASTQ files.
Instructions:
- Work independently in the main room, posting any questions that arise to slack.
- Recommendations for writing your own code:
- Read function documentation
- Test out ideas - it’s okay to make mistakes and generate errors
- Use a search engine to look up errors or recommended solutions using keywords
- We’ll review possible solutions after time is up as a group.
- Review Cutadapt’s help page and choose the proper arguments for our Cutadapt command(s).
- Use a bash variable along with Cutadapt to trim all remaining FASTQ files.
- Confirm that we have all of our expected output files.
Hint: Using a bash variable allows us to quickly change some arguments in a repeated command, e.g. :
noun="World"
echo "Hello, $noun!"
noun="Class"
echo "Hello, $noun!"
Solution - Cutadapt All Samples Exercise
One solution is to define a bash variable for the sample, use that variable in a Cutadapt command, and then redefine the variable before repeating the Cutadapt command for each change.
# Define a variable $SAMPLE
SAMPLE=sample_B
# Create a command using the variable $SAMPLE
cutadapt -q 30 -m 20 -o out_trimmed/${SAMPLE}_R1.trimmed.fastq.gz ../reads/${SAMPLE}_R1.fastq.gz
# Redefine the variable and run the command for each additional sample
SAMPLE=sample_C
cutadapt -q 30 -m 20 -o out_trimmed/${SAMPLE}_R1.trimmed.fastq.gz ../reads/${SAMPLE}_R1.fastq.gz
SAMPLE=sample_D
cutadapt -q 30 -m 20 -o out_trimmed/${SAMPLE}_R1.trimmed.fastq.gz ../reads/${SAMPLE}_R1.fastq.gz
SAMPLE=sample_E
cutadapt -q 30 -m 20 -o out_trimmed/${SAMPLE}_R1.trimmed.fastq.gz ../reads/${SAMPLE}_R1.fastq.gz
SAMPLE=sample_F
cutadapt -q 30 -m 20 -o out_trimmed/${SAMPLE}_R1.trimmed.fastq.gz ../reads/${SAMPLE}_R1.fastq.gz
Another solution is to create a for-loop with our bash variable and Cutadapt command. E.g.
for SAMPLE in sample_B sample_C sample_D sample_E sample_F
do
cutadapt -q 30 -m 20 -o out_trimmed/${SAMPLE}_R1.trimmed.fastq.gz ../reads/${SAMPLE}_R1.fastq.gz
done
Helper Hint: If suggesting a for-loop approach, it can be helpful to build up a “dry-run” command as a test case, to get learners to be more cognizant of what their code will do. Echoing filenames first might be a good suggestion.
LS0tCnRpdGxlOiAiRXhlcmNpc2UgMDEgU29sdXRpb24iCmF1dGhvcjogIlVNIEJpb2luZm9ybWF0aWNzIENvcmUiCm91dHB1dDoKICAgICAgICBodG1sX2RvY3VtZW50OgogICAgICAgICAgICBpbmNsdWRlczoKICAgICAgICAgICAgICAgIGluX2hlYWRlcjogaGVhZGVyLmh0bWwKICAgICAgICAgICAgdGhlbWU6IHBhcGVyCiAgICAgICAgICAgIGZpZ19jYXB0aW9uOiB0cnVlCiAgICAgICAgICAgIG1hcmtkb3duOiBHRk0KICAgICAgICAgICAgY29kZV9kb3dubG9hZDogdHJ1ZQotLS0KPHN0eWxlIHR5cGU9InRleHQvY3NzIj4KYm9keXsgLyogTm9ybWFsICAqLwogICAgICBmb250LXNpemU6IDE0cHQ7CiAgfQpwcmUgewogIGZvbnQtc2l6ZTogMTJwdAp9Cjwvc3R5bGU+Cgo8YnI+CgojIyBDdXRhZGFwdCBBbGwgU2FtcGxlcyBFeGVyY2lzZQoKPGJyPgoKKioxNSBNaW51dGVzKioKCjxicj4KCk5vdyB0aGF0IHdlJ3ZlIGxlYXJuZWQgdGhlIGJhc2ljcyBvZiBydW5uaW5nIEN1dGFkYXB0LCB3ZSBuZWVkIHRvIHRyaW0gYWxsIHRoZSByZXN0IG9mIG91ciBzYW1wbGVzLiBJZiB5b3UgcmVtZW1iZXIgZnJvbSB0aGUgQ29tcHV0YXRpb25hbCBGb3VuZGF0aW9ucyBjb3Vyc2UsIHdlIGxlYXJuZWQgYWJvdXQgdXNpbmcgYmFzaCB2YXJpYWJsZXMuIExldCdzIHRyeSBhbiBleGVyY2lzZSB3aGVyZSB3ZSB1c2UgYSBiYXNoIHZhcmlhYmxlIHRvIHRyaW0gZWFjaCBvbmUgb2Ygb3VyIEZBU1RRIGZpbGVzLgoKPGJyPgoKIyMjIEluc3RydWN0aW9uczoKCjxicj4KCi0gV29yayBpbmRlcGVuZGVudGx5IGluIHRoZSBtYWluIHJvb20sIHBvc3RpbmcgYW55IHF1ZXN0aW9ucyB0aGF0IGFyaXNlIHRvIHNsYWNrLgotIFJlY29tbWVuZGF0aW9ucyBmb3Igd3JpdGluZyB5b3VyIG93biBjb2RlOgogIC0gUmVhZCBmdW5jdGlvbiBkb2N1bWVudGF0aW9uCiAgLSBUZXN0IG91dCBpZGVhcyAtIGl0J3Mgb2theSB0byBtYWtlIG1pc3Rha2VzIGFuZCBnZW5lcmF0ZSBlcnJvcnMKICAtIFVzZSBhIHNlYXJjaCBlbmdpbmUgdG8gbG9vayB1cCBlcnJvcnMgb3IgcmVjb21tZW5kZWQgc29sdXRpb25zIHVzaW5nIGtleXdvcmRzCi0gV2UnbGwgcmV2aWV3IHBvc3NpYmxlIHNvbHV0aW9ucyBhZnRlciB0aW1lIGlzIHVwIGFzIGEgZ3JvdXAuCgo8YnI+CgotIFJldmlldyBDdXRhZGFwdCdzIGhlbHAgcGFnZSBhbmQgY2hvb3NlIHRoZSBwcm9wZXIgYXJndW1lbnRzIGZvciBvdXIgQ3V0YWRhcHQgY29tbWFuZChzKS4KLSBVc2UgYSBiYXNoIHZhcmlhYmxlIGFsb25nIHdpdGggQ3V0YWRhcHQgdG8gdHJpbSBhbGwgcmVtYWluaW5nIEZBU1RRIGZpbGVzLgotIENvbmZpcm0gdGhhdCB3ZSBoYXZlIGFsbCBvZiBvdXIgZXhwZWN0ZWQgb3V0cHV0IGZpbGVzLgoKPGJyPgoKCj4gSGludDogVXNpbmcgYSBiYXNoIHZhcmlhYmxlIGFsbG93cyB1cyB0byBxdWlja2x5IGNoYW5nZSBzb21lIGFyZ3VtZW50cyBpbiBhIHJlcGVhdGVkIGNvbW1hbmQsIGUuZy4gOgo+Cj4gfn5+Cj4gbm91bj0iV29ybGQiCj4gZWNobyAiSGVsbG8sICRub3VuISIKPiBub3VuPSJDbGFzcyIKPiBlY2hvICJIZWxsbywgJG5vdW4hIgo+IH5+fgoKPGJyPgoKIyMjIFNvbHV0aW9uIC0gQ3V0YWRhcHQgQWxsIFNhbXBsZXMgRXhlcmNpc2UKCjxicj4KCk9uZSBzb2x1dGlvbiBpcyB0byBkZWZpbmUgYSBiYXNoIHZhcmlhYmxlIGZvciB0aGUgc2FtcGxlLCB1c2UgdGhhdCB2YXJpYWJsZSBpbiBhIEN1dGFkYXB0IGNvbW1hbmQsIGFuZCB0aGVuIHJlZGVmaW5lIHRoZSB2YXJpYWJsZSBiZWZvcmUgcmVwZWF0aW5nIHRoZSBDdXRhZGFwdCBjb21tYW5kIGZvciBlYWNoIGNoYW5nZS4KCiAgICAjIERlZmluZSBhIHZhcmlhYmxlICRTQU1QTEUKICAgIFNBTVBMRT1zYW1wbGVfQgogICAgIyBDcmVhdGUgYSBjb21tYW5kIHVzaW5nIHRoZSB2YXJpYWJsZSAkU0FNUExFCiAgICBjdXRhZGFwdCAtcSAzMCAtbSAyMCAtbyBvdXRfdHJpbW1lZC8ke1NBTVBMRX1fUjEudHJpbW1lZC5mYXN0cS5neiAuLi9yZWFkcy8ke1NBTVBMRX1fUjEuZmFzdHEuZ3oKCiAgICAjIFJlZGVmaW5lIHRoZSB2YXJpYWJsZSBhbmQgcnVuIHRoZSBjb21tYW5kIGZvciBlYWNoIGFkZGl0aW9uYWwgc2FtcGxlCiAgICBTQU1QTEU9c2FtcGxlX0MKICAgIGN1dGFkYXB0IC1xIDMwIC1tIDIwIC1vIG91dF90cmltbWVkLyR7U0FNUExFfV9SMS50cmltbWVkLmZhc3RxLmd6IC4uL3JlYWRzLyR7U0FNUExFfV9SMS5mYXN0cS5negoKICAgIFNBTVBMRT1zYW1wbGVfRAogICAgY3V0YWRhcHQgLXEgMzAgLW0gMjAgLW8gb3V0X3RyaW1tZWQvJHtTQU1QTEV9X1IxLnRyaW1tZWQuZmFzdHEuZ3ogLi4vcmVhZHMvJHtTQU1QTEV9X1IxLmZhc3RxLmd6CgogICAgU0FNUExFPXNhbXBsZV9FCiAgICBjdXRhZGFwdCAtcSAzMCAtbSAyMCAtbyBvdXRfdHJpbW1lZC8ke1NBTVBMRX1fUjEudHJpbW1lZC5mYXN0cS5neiAuLi9yZWFkcy8ke1NBTVBMRX1fUjEuZmFzdHEuZ3oKCiAgICBTQU1QTEU9c2FtcGxlX0YKICAgIGN1dGFkYXB0IC1xIDMwIC1tIDIwIC1vIG91dF90cmltbWVkLyR7U0FNUExFfV9SMS50cmltbWVkLmZhc3RxLmd6IC4uL3JlYWRzLyR7U0FNUExFfV9SMS5mYXN0cS5negoKPGJyPgoKQW5vdGhlciBzb2x1dGlvbiBpcyB0byBjcmVhdGUgYSBmb3ItbG9vcCB3aXRoIG91ciBiYXNoIHZhcmlhYmxlIGFuZCBDdXRhZGFwdCBjb21tYW5kLiBFLmcuCgogICAgZm9yIFNBTVBMRSBpbiBzYW1wbGVfQiBzYW1wbGVfQyBzYW1wbGVfRCBzYW1wbGVfRSBzYW1wbGVfRgogICAgICAgIGRvCiAgICAgICAgY3V0YWRhcHQgLXEgMzAgLW0gMjAgLW8gb3V0X3RyaW1tZWQvJHtTQU1QTEV9X1IxLnRyaW1tZWQuZmFzdHEuZ3ogLi4vcmVhZHMvJHtTQU1QTEV9X1IxLmZhc3RxLmd6CiAgICBkb25lCgo8YnI+Cgo+IEhlbHBlciBIaW50OiBJZiBzdWdnZXN0aW5nIGEgZm9yLWxvb3AgYXBwcm9hY2gsIGl0IGNhbiBiZSBoZWxwZnVsIHRvIGJ1aWxkIHVwIGEgImRyeS1ydW4iIGNvbW1hbmQgYXMgYSB0ZXN0IGNhc2UsIHRvIGdldCBsZWFybmVycyB0byBiZSBtb3JlIGNvZ25pemFudCBvZiB3aGF0IHRoZWlyIGNvZGUgd2lsbCBkby4gRWNob2luZyBmaWxlbmFtZXMgZmlyc3QgbWlnaHQgYmUgYSBnb29kIHN1Z2dlc3Rpb24uCgo8YnI+Cg==