Pre-lecture materials
Read ahead
Acknowledgements
Material for this lecture was borrowed and adopted from
Learning objectives
Pipes
Capturing output from commands
In this section, we will continue to explore how to use pipes to re-direct output from to the terminal and write it to a file.
Let’s count the lines in one of the files cubane.pdb
using the wc
command (word count):
wc -l proteins/cubane.pdb
20 proteins/cubane.pdb
This is useful information, but all of that output gets printed to the screen and then it’s gone. Let’s try saving the output to a file with the redirection >
operator:
Appending data to a file
In general, it is a very bad idea to try redirecting the output of a command that operates on a file to the same file.
For example:
Bash
sort -n lengths.txt > lengths.txt
Doing something like this may give you incorrect results and/or delete the contents of lengths.txt
.
An alternative is another type of redirect operator (>>
), which is used to append to a file (command >> [file]
).
Let’s try this out.
OK let’s clean up our space before we move on
rm testfile01.txt testfile02.txt
Passing output to another command
Another operator is the vertical bar (|
) (or pipe operator) which is used between two commands to pass the output from one command as input to another command ([first] | [second]
).
Loops
Loops are a programming construct which allow us to repeat a command or set of commands for each item in a list.
- As such they are key to productivity improvements through automation.
- Similar to wildcards and tab completion, using loops also reduces the amount of typing required (and hence reduces the number of typing mistakes).
Suppose we have several hundred genome data files ending in .dat
and our goal is to extract a piece of information from each file.
Here, we would like to print out the classification for each species (given on the second line of each file).
One way to do this is for each file, we could use the command head -n 2
and pipe this to tail -n 1
.
Loop basics
Another way to do this is to use a loop to solve this problem, but first let’s look at the general form of a for
loop, using the pseudo-code below:
Bash
for thing in list_of_things
do
operation_using $thing # Indentation within the loop is not required, but aids legibility
done
and we can apply this to our example like this:
cd creatures
for filename in basilisk.dat minotaur.dat unicorn.dat
do
head -n 2 $filename | tail -n 1
done
CLASSIFICATION: basiliscus vulgaris
CLASSIFICATION: bos hominus
CLASSIFICATION: equus monoceros
Naming files
You can also use the variables in for
loops to name files or folders.
For example, let’s say we want to save a version of the original files in the creatures
folder, naming the copies original-basilisk.dat
and original-unicorn.dat
, etc.
cd creatures
for filename in *.dat
do
cp $filename original-$filename
done
ls *.dat
basilisk.dat
minotaur.dat
original-basilisk.dat
original-minotaur.dat
original-unicorn.dat
unicorn.dat
This loop runs the cp
command once for each filename. The first time, when $filename
expands to basilisk.dat
, the shell executes:
Bash
cp basilisk.dat original-basilisk.dat
and so on. Finally, let’s clean up our copies
rm creatures/original-*
ls creatures/*
creatures/basilisk.dat
creatures/minotaur.dat
creatures/unicorn.dat
Basics of bash scripting
We are finally ready to see what makes the shell such a powerful programming environment.
We are going to take the commands we repeat frequently and save them in files so that we can re-run all those operations again later by typing a single command.
For historical reasons, a bunch of commands saved in a file is usually called a shell script, but make no mistake: these are actually small programs.
Not only will writing shell scripts make your work faster — you won’t have to retype the same commands over and over again — it will also make it more accurate (fewer chances for typos) and more reproducible.
Create a .sh
file
Let’s start by going back to proteins/
and creating a new file, middle.sh
which will become our shell script:
cd proteins
touch middle.sh
We can open the file and simply insert the following line:
head -n 15 octane.pdb | tail -n 5
This is a variation on the pipe we constructed earlier:
- it selects lines 11-15 of the file
octane.pdb
.
We can see that the directory proteins/
now contains a file called middle.sh
.
Once we have saved the file, we can ask the shell to execute the commands it contains.
cd proteins
bash middle.sh
File arguements
What if we want to select lines from an arbitrary file?
We could edit middle.sh
each time to change the filename, but that would probably take longer than typing the command out again in the shell and executing it with a new file name.
Instead, let’s edit middle.sh
and make it more versatile:
- Replace the text
octane.pdb
with the special variable called$1
:
head -n 15 "$1" | tail -n 5
We can now run our script like this:
cd proteins
bash middle.sh octane.pdb
or on a different file like this:
cd proteins
bash middle.sh pentane.pdb
Currently, we need to edit middle.sh
each time we want to adjust the range of lines that is returned.
Let’s fix that by configuring our script to instead use three command-line arguments.
- After the first command-line argument (
$1
), each additional argument that we provide will be accessible via the special variables$1
,$2
,$3
, which refer to the first, second, third command-line arguments, respectively.
This works, but it may take the next person who reads middle.sh a moment to figure out what it does. We can improve our script by adding some comments at the top of the file:
# Select lines from the middle of a file.
# Usage: bash middle.sh filename end_line num_lines
head -n "$2" "$1" | tail -n "$3"
Finally, let’s clean up our space
cd proteins
rm middle.sh
Secure shell protocol
The Secure Shell Protocol (SSH) is a tool you can use to connect and authenticate to remote servers and services (e.g. GitHub, JHPCE, etc).
With SSH keys, you can connect to GitHub without supplying your username and personal access token at each visit. You can also use an SSH key to sign commits.
Overview
The SSH protocol uses encryption to secure the connection between a client and a server.
All user authentication, commands, output, and file transfers are encrypted to protect against attacks in the network.
For details of how the SSH protocol works, see the protocol page. To understand the SSH File Transfer Protocol, see the SFTP page.
You can read more about setting up your SSH keys to connect to JHPCE here:
https://jhpce.jhu.edu/knowledge-base/authentication/ssh-key-setup
Demo connecting to JHPCE via
ssh
You can read more about setting up your SSH keys to connect to GitHub here: