= "HELLO"
word for character in word:
# Do something interesting with each character...
print(character)
H
E
L
L
O
An algorithm is a step-by-step procedure for solving a problem or accomplishing a task. It as a detailed set of instructions that takes some input, follows a clear sequence of steps, and produces a desired output.
Algorithms can vary greatly in their level of complexity, from simple operations like finding the larger of two numbers to complex tasks such as generating a phylogenetic tree from a sequence alignment. It’s worth noting that the same problem might have multiple algorithmic solutions, each with its own advantages and trade-offs in terms of simplicity and efficiency.
(Adapted from The Art of Computer Programming by Donald E. Knuth)
All algorithms share some important properties:
There are a few more important properties of algorithms. Generally an algorithm should produce the same output given the same input. For example, if your algorithm is supposed to triple a number, an input of 5
should always produce an output of 15
. Additionally, an algorithm should ideally be general enough to solve similar problems in a category. Your tripling algorithm would be more useful (and general) if a user could supply both the number to be multiplied (multiplicand) and the multiplier. This way, the same algorithm could be used for doubling, tripling, quadrupling, etc.
Being able to think algorithmically is essential for success in programming. Algorithmic thinking is the ability to break down problems into clear, logical steps that a computer can follow – like writing a very detailed recipe where nothing can be assumed or left to interpretation. This skill helps you break down complex problems and translate them into effective code solutions.
Let’s go through the various aspects of algorithms.
Every algorithm consists of three fundamental parts:
Before you can write an algorithm, you need to understand what problem you’re trying to solve:
You can think of these as “behind-the-scenes” components. They are critical to algorithmic thinking and construction, but not always explicitly part of the algorithm itself.
Once you have these components in mind, you can break large problems into smaller pieces, which are much more manageable:
Depending on the complexity of your problem, it can be helpful to sketch out your solution in plain language or pseudocode:
Planning your code’s structure and components before diving into actual programming makes the whole process much smoother. When you tackle problems this way, you can focus on one aspect at a time, first mapping out the logic and flow, then implementing the code itself. This approach prevents you from getting overwhelmed by trying to solve multiple challenges simultaneously.
While thorough planning is essential when you’re learning, you’ll likely develop a more streamlined approach as you gain experience. For simpler problems, you may find yourself able to start coding directly, having internalized the planning process. However, for complex projects, taking time to sketch out your approach first remains valuable regardless of your skill level.
Now that you have a solid plan, it’s time to translate it into working code.
This stage is a bit like assembling the pieces of a puzzle where you already know what the final picture should look like. Take your time with each component – rushing through implementation often leads to mistakes that can be challenging to fix later.
After your implementation is complete, be sure to test that your algorithm works correctly:
We will discuss more testing strategies in a later tutorial.
Let’s take an everyday activity, making coffee, and practice turning it into clear instructions that could work as an algorithm. We will start with some pretty bad instructions, identify problems with them, and then refine them.
(Apologies to all the tea lovers reading this!)
Here is a silly set of instructions for making coffee:
You’ll want to put some liquid in there first, then put the paper thing in. Get the coffee ready – not too much, not too little. Make sure everything is closed tight before you get things going. Now, just give it a tap and wait a while. If all goes well, you should end up with something drinkable!
If you have ever made coffee before, you could probably figure out how to follow these instructions. However, it doesn’t really work as an algorithm:
Let’s address some of these issues and try again.
Pour fresh water into the top part until it looks full enough. Insert a clean paper filter (any size should work) into the basket area. Measure some coffee grounds – about a few spoonfuls should do it, depending on how strong you like it. Make sure to close the top properly until you hear a click or something. Find the power button (it might be on the front or side) and push it down. After a few minutes when you stop hearing the machine make noise, your coffee should be done!
Though this version is definitely better than the last one, it still has a few issues:
Again, if you have ever used a coffee machine, you could probably understand the instructions and adapt them to your taste to make a good cup of coffee. But to be a good algorithm, it still needs more precision, less ambiguity, and it shouldn’t leave so much up to your own taste.
Let’s address some more of those ambiguities.
Pour 8 cups of fresh water into the top reservoir, filling to the marked water line. Insert a #4 cone paper filter into the filter basket. Measure 2 level tablespoons of ground coffee per cup of water (16 tablespoons total for a full pot). Press down firmly on the lid until you hear a distinct click indicating it’s fully closed. Locate the power switch on the front panel and press it to the “ON” position. The brewing process will take approximately 5-7 minutes, and is complete when the gurgling sound stops and the brewing indicator light turns off. Your coffee is now ready to serve.
Here are some specific improvements as compared to the last version:
Though we could keep refining these instructions, it’s not a bad description of making coffee now!
If this were a “real algorithm” that we needed to program in Python, there are some more things we should think about. When writing instructions for a computer (or a coffee maker!), it’s easy to focus on the happy path – the sequence of actions that work perfectly. However, robust algorithms must consider various other factors to effectively handle real-world situations where things can go wrong.
Here are some things to think about that are “beyond the basic steps”.
Before starting any process, we need to ensure everything is in place. This includes checking equipment, materials, and the system’s readiness. (You might see the term “preconditions” if you are reading about algorithms online.)
Things can go wrong. A good algorithm anticipates potential problems and provides solutions.
While you can’t generally anticipate everything that may go wrong, it’s a good idea to put some thought into it, and try to handle any likely errors.
Generally, certain steps will rely on others. We need to define the correct order of operations.
Algorithms often need to handle different scenarios based on input or conditions.
Validation and verification of any post-conditions is essential to ensure each step is completed successfully and the final result is correct.
While we don’t need this level of detail for every example, it’s valuable to understand how simple procedures evolve into robust algorithms through careful consideration of edge cases, error handling, and validation steps.
This methodology applies broadly to software development: start simple, then systematically address complexities and potential problems.
When you’re learning to program, it helps to recognize that many solutions are built from common, reusable patterns. These patterns are basic building blocks that you can combine and adapt to solve more complex problems.
While there are often many ways to solve a programming challenge, we’ll focus on straightforward approaches that are easy to understand and implement. These might not always be the most efficient solutions, but they’re good learning tools that will help you:
As you gain experience, you’ll learn more sophisticated methods, or ways that are built-in to the language itself, but mastering these fundamental patterns first will give you a solid foundation. Let’s look at some practical examples that demonstrate these basic patterns in action.
In this tutorial, we will mostly focus on strategies that involve looking at one element at a time from a sequence. Often, this sequential processing will also involve tracking or accumulating values.
Tutorial 2 showed many examples of using for loops to process characters of a string one-by-one. We will repeat some of them here so that you can get more practice seeing the common patterns.
Printing each letter of a string:
= "HELLO"
word for character in word:
# Do something interesting with each character...
print(character)
H
E
L
L
O
Accessing the index of the character during iteration:
# Print with position
= "hello"
word for index, letter in enumerate(word):
print(f"Position {index}: {letter}")
Position 0: h
Position 1: e
Position 2: l
Position 3: l
Position 4: o
Processing a string in reverse order:
# Process in reverse
for letter in reversed(word):
print(letter)
o
l
l
e
h
Counting the frequency of individual letters:
= {}
letter_counts
for letter in word:
= letter_counts.get(letter, 0)
current_count = current_count + 1
letter_counts[letter]
print(letter_counts)
{'h': 1, 'e': 1, 'l': 2, 'o': 1}
Note: This is a good example of what I mentioned above regarding these solutions not always being the best way to do something. In Tutorial 2, we discussed a better way to approach this particular counting problem. Can you remember it?
Tracking a running sum:
= [2, 5, 3, 1]
numbers = 0
total
for number in numbers:
+= number
total
print(f"Total: {total}")
Total: 11
While Python provides built-in functions like sum()
for this specific case, understanding the basic pattern helps with more complex variations.
Sum of positive numbers:
= [-1, 2, -5, 3, -8, 1]
numbers
= sum(num for num in numbers if num > 0)
positive_sum
print(f"Sum of positive numbers: {positive_sum}")
Sum of positive numbers: 6
Calculating the average of a list of numbers:
# Calculate average
= [2, 5, 3, 8, 1]
numbers = sum(numbers) / len(numbers)
average print(f"Average: {average}")
Average: 3.8
Finding the largest number in a list without using Python’s built-in max
function:
= [5, 3, 0, -1, 8]
numbers = numbers[0]
largest_number
for number in numbers[1:]:
if number > largest_number:
= number
largest_number
print(largest_number)
8
Finding the shortest string in a list:
= ["i", "like", "apple", "pie"]
words = words[0]
shortest_word
for word in words[1:]:
if len(word) < len(shortest_word):
= word
shortest_word
print(shortest_word)
i
Another common task is finding an item in a collection or validating some condition of a collection.
Finding a specific number in a list:
= 5
target = list(range(10))
numbers = False
is_found
for number in numbers:
if number == target:
= True
is_found print("we found the number!")
if not is_found:
print("we didn't find the number!")
we found the number!
Checking if a list is sorted. (Before reading the code, try and think of how the solution might look.)
= [1, 2, 4, 3, 5]
numbers = numbers[0]
previous_number
= True
is_sorted
for current_number in numbers[1:]:
if current_number < previous_number:
= False
is_sorted break
= current_number
previous_number
if is_sorted:
print("the list is sorted!")
else:
print("the list is not sorted!")
the list is not sorted!
This example has a couple of interesting things to focus on:
break
since that is enough to say the array as a whole is unsorted.Problems often require nested loops, such as cases where for every item in one list, you need to process every item in another list. Note that these nested loop problems can often be solved in clever ways that help you avoid a having to look at every combination. There’s a good chance you will see some of these clever solutions as you are exposed to more code in the future.
Calculating distances between points. Here we are using 1-dimensional points. The distance between two points in 1D (on a line) is the absolute value of their difference. E.g., if you have two points x₁
and x₂
, the distance between them is |x₁ - x₂|
.
= [8, 3, 4]
points = []
distances
for x in points:
for y in points:
= abs(x - y)
distance
distances.append((x, y, distance))
for x, y, distance in distances:
print(f"({x}, {y}) => {distance}")
(8, 8) => 0
(8, 3) => 5
(8, 4) => 4
(3, 8) => 5
(3, 3) => 0
(3, 4) => 1
(4, 8) => 4
(4, 3) => 1
(4, 4) => 0
You could imagine instead of distances of 1D points, this pattern could work for calculating all-vs-all homology scores from BLAST output, or comparing some aspect of each sample vs. every other sample.
Here’s a slightly different example. In this case, say we have ecological distances between all sampling locations stored in a dictionary. Here is one way that you might loop through them:
= {
sample_distances "S1": {"S1": 0, "S2": 3, "S3": 5},
"S2": {"S1": 2, "S2": 0, "S3": 1},
"S3": {"S1": 6, "S2": 2, "S3": 0},
}
for sample_a, other_samples in sample_distances.items():
for sample_b, distance in other_samples.items():
print(f"{sample_a} -> {sample_b} => {distance}")
S1 -> S1 => 0
S1 -> S2 => 3
S1 -> S3 => 5
S2 -> S1 => 2
S2 -> S2 => 0
S2 -> S3 => 1
S3 -> S1 => 6
S3 -> S2 => 2
S3 -> S3 => 0
While there are often clever ways to avoid these type of all-vs-all comparisons, they still come up pretty frequently, so it’s a good idea to get familiar with them!
When we write code to solve a problem, there’s usually more than one way to do it. It’s a bit like how you may have different routes to get to work – some are faster, some are more efficient, and some are just easier to remember. The same applies to our code solutions.
When evaluating different approaches to solving a problem, we typically look at three main things:
Let’s focus on efficiency in terms of time for a moment. Imagine you have a list of genes to search through. You could check each gene one by one (we call this linear time), or you might have a clever way to eliminate half the possibilities with each step (logarithmic time). As your gene list grows from hundreds to millions of entries, these different approaches can mean the difference between waiting seconds versus hours for your results.
Computer scientists use something called “Big O notation” to describe how an algorithm’s performance changes as the input gets larger. Here are some common patterns you’ll encounter.
O(1)
): The operation always takes the same amount of timeO(n)
): The time increases directly with the size of the inputO(n²)
): The time increases with the square of the input sizeThe key takeaway is that some solutions scale better than others when working with larger datasets. As you write code, keeping these basic patterns in mind will help you make better choices about how to approach problems.
Here are some simple examples to illustrate these three time complexities.
Constant time operations like dictionary lookups:
= {"nrdA": "ribonucleotide reductase"}
gene_info = gene_info["nrdA"]
result print(result)
ribonucleotide reductase
Linear time operations like checking each item in a list once:
# Counting mutations
= "ACTACTGTACTACTGTCACACTAGAGTAT"
dna_sequence = 0
t_count
for base in dna_sequence:
if base == "T":
+= 1
t_count
print(t_count)
9
Quadratic time operations like comparing every item with every other item:
# Finding equivalent sequences
= ["ACTG", "ATGAC", "ACTGGT", "ACTG"]
sequences = len(sequences)
sequence_count
for i in range(sequence_count):
for j in range(sequence_count):
if i != j and sequences[i] == sequences[j]:
print(f"Match found: {sequences[i]}")
Match found: ACTG
Match found: ACTG
In addition to thinking about how long our code takes to run, sometimes we also need to consider how much memory it uses. Space complexity describes how memory usage grows with input size. The two most common patterns you’ll encounter are:
Here are some examples.
In a constant space solution, the same few variables are used regardless of the input size:
= 0
g_count
for base in dna_sequence:
if base == "G":
+= 1
g_count
print(g_count)
4
In this example, we’re just counting, so we only need one variable no matter how long the DNA sequence is.
In a linear space solution, the space needed to calculate the result grows linearly with the size of the input.
= []
g_positions
for i in range(len(dna_sequence)):
if dna_sequence[i] == "G":
g_positions.append(i)
print(g_positions)
[6, 14, 23, 25]
In this example, we’re storing positions, so we need more space for longer sequences.
Let’s finish off this tutorial by looking at a common, beginner-level algorithmic puzzle: checking if a string is a palindrome.
A word is a palindrome if it reads the same forward and backward.
Note: This problem description is adapted from LeetCode problem 125. Valid Palindrome.
First, let’s understand what we’re trying to do in plain English:
(For this version of the problem, we can assume that the strings we need to check are all a single word with all lowercase letters.)
Let’s start with the most obvious solution that uses Python string slicing to check the definition of a palindrome.
= "racecar"
string = string == string[::-1]
is_palindrome print(string, is_palindrome)
= "apple"
string = string == string[::-1]
is_palindrome print(string, is_palindrome)
racecar True
apple False
This is probably how most people would first think about it: “Just reverse it and compare!”
Often, the “but” doesn’t really matter in whatever work you are doing. Many times “clear and maintainable” are much more important that optimal efficiency. However, in the world of algorithmic puzzles, the “but” is definitely something you want to address!
For this solution, we think, “Wait, do we really need to reverse the string? What a waste of time and space!”
Instead, we will look at “pairs” of letters. First, let’s try a word that is a palindrome:
= "racecar"
string print("Checking {string}")
= True
is_palindrome
for i in range(len(string)):
= len(string) - i - 1
j
print(i, j, string[i], string[j])
if string[i] != string[j]:
= False
is_palindrome break
print(string, is_palindrome)
Checking {string}
0 6 r r
1 5 a a
2 4 c c
3 3 e e
4 2 c c
5 1 a a
6 0 r r
racecar True
Next, try a word that is not a palindrome, just to see the difference:
= "racethecar"
string print("\n\nChecking {string}")
= True
is_palindrome
for i in range(len(string)):
= len(string) - i - 1
j
print(i, j, string[i], string[j])
if string[i] != string[j]:
= False
is_palindrome break
print(string, is_palindrome)
Checking {string}
0 9 r r
1 8 a a
2 7 c c
3 6 e e
4 5 t h
racethecar False
Do you see how the non-palindrome stops before checking all the values?
Here is a breakdown of the above solution:
Just so it is clear, let’s explain that j
index line. This diagram shows why we use that formula to calculate j
:
len("racecar") == 7
┌───┬───┬───┬───┬───┬───┬───┐
│ r │ a │ c │ e │ c │ a │ r │
└───┴───┴───┴───┴───┴───┴───┘
0 1 2 3 4 5 6
↑ ↑ ↑ ↑ ↑ ↑ ↑
i=0 │ │ │ │ │ j = 7 - 0 - 1 = 6
│ │ │ │ │
i=1 │ │ │ j = 7 - 1 - 1 = 5
│ │ │
i=2 │ j = 7 - 2 - 1 = 4
│
i=3, j = 7 - 3 - 1 = 3
and so on...
There was another “but” in Solution 2: Wouldn’t it be better if we didn’t have to do those duplicated checks? Let’s give it a shot.
= "racecar"
string = True
is_palindrome
= 0
i = len(string) - 1
j
while i < j:
print(i, j, string[i], string[j])
if string[i] != string[j]:
= False
is_palindrome break
+= 1
i -= 1
j
print(string, is_palindrome)
0 6 r r
1 5 a a
2 4 c c
racecar True
And here again with a string that is not a palindrome:
= "racethecar"
string = True
is_palindrome
= 0
i = len(string) - 1
j
while i < j:
print(i, j, string[i], string[j])
if string[i] != string[j]:
= False
is_palindrome break
+= 1
i -= 1
j
print(string, is_palindrome)
0 9 r r
1 8 a a
2 7 c c
3 6 e e
4 5 t h
racethecar False
And the breakdown:
This is both time and space efficient, and doesn’t do any more checks than we need to do. Cool, right?
The above process of refinement suggests a general approach to these kinds of algorithmic puzzles.
Focus on making a basic version work before aiming for perfection. Begin with a simple solution, ensure it functions correctly, and then gradually improve it. This mirrors the process of algorithmic thinking described above of breaking down complex problems into manageable steps and refining your solution as needs evolve. This is similar to how programming works in real-world scenarios!
In this tutorial, we introduced the concept of algorithmic thinking and simple algorithms. We went over some common patterns for simple problems that can form the building blocks of more complex solutions. Then we covered the basics of algorithmic complexity analysis, and finally, went through the process of solving a common algorithmic puzzle.
You might be wondering how these basic programming concepts connect to the bioinformatics tools you’ll use in your research. While we rely on sophisticated software for tasks like homology search and genome assembly, these powerful tools are built on the same fundamental programming principles we’re learning now.
Algorithmic thinking, or understanding how to break down problems and translate them into logical steps, is a foundational skill for all coding work. Though we won’t be building complex bioinformatics tools from scratch in this course, mastering the basics will give you a solid foundation for writing your own analysis scripts and understanding how the bioinformatics tools you use actually work.
Knuth, Donald E. 1997. The Art of Computer Programming: Volume 1: Fundamental Algorithms. 3rd ed. Boston, MA: Addison Wesley.