if 47 > 40:
print("Temperature too high!")
Temperature too high!
Welcome to your first Python tutorial! In this lesson, we’ll explore the fundamental building blocks of Python programming, including:
This is a comprehensive tutorial that covers a lot of ground. Don’t feel pressured to master everything at once – we’ll be practicing these concepts throughout the course. Think of this as your first exposure to these ideas, and we’ll build on them step by step in the coming weeks.
Python is a high-level, interpreted programming language known for its simplicity and readability. Created by Guido van Rossum in 1991, it has become one of the most popular languages in scientific computing and bioinformatics.
Python is a high-level programming language, meaning it handles many complex computational details automatically. For example, rather than managing computer memory directly, Python does this for you. This allows biologists and researchers to focus on solving scientific problems rather than dealing with technical computing details.
Python is an interpreted language, which means you can write code and run it immediately without an extra compilation step. This makes it ideal for bioinformatics work where you often need to:
Python’s code is designed to be readable and clear, often reading almost like English. For example:
if dna_sequence.startswith(start_codon) and dna_sequence.endswith(stop_codon):
potential_genes.append(dna_sequence)
Even if you’re new to programming, you can probably guess that this code is looking for potential genes by checking a DNA sequence for a start and a stop codon, and if found, adding the sequence to a list of potential genes.
This readability is particularly valuable in research settings where code needs to be shared and reviewed by collaborators.
Python is a versatile language that can be used for a wide range of applications, including:
And of course, bioinformatics and scientific computing:
Python has become a widely used tool in bioinformatics for several key reasons:
Whether you’re analyzing sequencing data, building analysis pipelines, or developing new computational methods, Python provides the tools and community support needed for modern biological research.
Think of variables as labeled containers for storing data in your program. Just as you might label test tubes in a lab to keep track of different samples, variables let you give meaningful names to your data – whether they’re numbers, text, true/false values, or more complex information.
For example, instead of working with raw values like this:
if 47 > 40:
print("Temperature too high!")
Temperature too high!
You can use descriptive variables to make your code clearer:
= 42.3
temperature = 40.0
temperature_threshold
if temperature > temperature_threshold:
print("Temperature too high!")
Temperature too high!
In this section, we’ll cover:
By the end, you’ll be able to use variables effectively to write clear, maintainable research code.
In Python, you create a variable by giving a name to a value using the =
operator. Here’s a basic example:
= 1000
sequence_length = "Escherichia coli" species_name
You can then use these variables anywhere in your code by referring to their names. Variables can be combined to create new variables:
# Combining text (string) variables
= "Escherichia"
genus = "coli"
species = genus + " " + species
full_name print(full_name) # Prints: Escherichia coli
# Calculations with numeric variables
= 1000000
reads_forward = 950000
reads_reverse = reads_forward + reads_reverse
total_reads print(total_reads) # Prints: 1950000
Escherichia coli
1950000
Notice how the +
operator works differently depending on what type of data we’re using:
You can also use variables in more complex calculations:
= 2200
gc_count = 5000
total_bases = gc_count / total_bases
gc_content print(gc_content) # Prints: 0.44
0.44
The ability to give meaningful names to values makes your code easier to understand and modify. Instead of trying to remember what the number 5000 represents, you can use a clear variable name like total_bases
.
Python allows you to change what’s stored in a variable after you create it. Let’s see how this works:
= 100
read_depth print(f"Initial read depth: {read_depth}")
= 47
read_depth print(f"Updated read depth: {read_depth}")
Initial read depth: 100
Updated read depth: 47
This flexibility extends even further – Python lets you change not just the value, but also the type of data a variable holds:
= 30
quality_score = "High quality"
quality_score print(quality_score)
High quality
While this flexibility can be useful, it can also lead to unexpected behavior if you’re not careful. Here’s an example that could cause problems in a sequence analysis pipeline:
# Correctly calculates and prints the total number of sequences.
= 1000
sequences_per_sample = 5
sample_count = sequences_per_sample * sample_count
total_sequences print(f"total sequences: {total_sequences}")
# This one produces an unexpected result!
= "1000 sequences "
sequences_per_sample = 5
sample_count = sequences_per_sample * sample_count
total_sequences print(f"total sequences: {total_sequences}")
total sequences: 5000
total sequences: 1000 sequences 1000 sequences 1000 sequences 1000 sequences 1000 sequences
In the second case, instead of performing multiplication, Python repeats the string "1000 sequences "
5 times! This is probably not what you wanted in your genomics pipeline!
This kind of type changing can be a common source of bugs, especially when:
Best practice is to be consistent with your variable types throughout your code, and explicitly convert between types when necessary.
Let’s look at a common pattern when working with variables. Here’s one way to increment a counter:
= 100
read_count = read_count + 50
read_count print(f"Total reads: {read_count}")
Total reads: 150
Python provides a shorter way to write this using augmented assignment operators:
= 100
read_count += 50
read_count print(f"Total reads: {read_count}")
Total reads: 150
These augmented operators combine arithmetic with assignment. Common ones include:
+=
: augmented addition (increment)-=
: augmented subtraction (decrement)*=
: augmented multiplication/=
: augmented divisionThese operators are particularly handy when updating running totals or counters, like when tracking how many sequences pass quality filters. We’ll explore more uses in the next tutorial.
Sometimes you’ll want to define values that shouldn’t change throughout your program.
= 64
GENETIC_CODE_SIZE print(f"There are {GENETIC_CODE_SIZE} codons in the standard genetic code")
= ['A', 'T', 'C', 'G']
DNA_BASES print(f"The DNA bases are: {DNA_BASES}")
There are 64 codons in the standard genetic code
The DNA bases are: ['A', 'T', 'C', 'G']
In Python, we use ALL_CAPS names as a convention to indicate these values shouldn’t change. However, it’s important to understand that Python doesn’t actually prevent these values from being changed. For example:
= 30
MIN_QUALITY_SCORE print(f"Filtering sequences with quality scores below {MIN_QUALITY_SCORE}")
= 20 # We can change it, even though we shouldn't!
MIN_QUALITY_SCORE print(f"Filtering sequences with quality scores below {MIN_QUALITY_SCORE}")
Filtering sequences with quality scores below 30
Filtering sequences with quality scores below 20
Think of Python variables like labels on laboratory samples: you can always move a label from one test tube to another. When you write:
= ['A', 'T', 'C', 'G']
DNA_BASES = ['A', 'U', 'C', 'G'] # Oops, switched to RNA bases!
DNA_BASES print(f"These are now RNA bases: {DNA_BASES}")
These are now RNA bases: ['A', 'U', 'C', 'G']
You’re not modifying the original list of DNA bases – instead, you’re creating a new list and moving the DNA_BASES
label to point to it. The original list isn’t “protected” in any way. So, it’s more of a convention that ALL_CAPS variables be treated as constants in your code, even though Python won’t enforce this rule.
Here’s a common pitfall when naming variables in Python – accidentally overwriting built-in functions.
Python has several built-in functions that are always available, including one called str
that converts values to strings. For example:
sequence = str() # Creates an empty string
sequence
Note: if you convert this static code block to one that is runnable, and then actually run it, it would cause errors in the rest of the notebook in any place that uses the str
function. If you do this, you will need to restart the notebook kernel.
However, Python will let you use these built-in names as variable names (though you shouldn’t!):
str = "ATCGGCTAA" # Don't do this!
Now if you try to use the str
function later in your code:
quality_score = 35
sequence_info = str(quality_score) # This will fail!
You’ll get an error:
TypeError: 'str' object is not callable
This error occurs because we’ve “shadowed” the built-in str
function with our own variable. Python now thinks we’re trying to use the string “ATCGGCTAA” as a function, which doesn’t work!
We’ll discuss errors in more detail in a future lesson. For now, remember to avoid using Python’s built-in names (like str
, list
, dict
, set
, len
) as variable names. You can find a complete list of built-ins in the Python documentation.
Clear, descriptive variable names are crucial for writing maintainable code. When you revisit your analysis scripts months later, good variable names will help you remember what each part of your code does.
Python variable names can include:
While Python allows Unicode characters (like Greek letters), it’s usually better to stick with standard characters:
= 3.14 # Possible, but not recommended
π = 3.14 # Better! pi
Python treats uppercase and lowercase letters as different characters:
= "ATCG"
sequence = "GCTA"
Sequence print(f"{sequence} != {Sequence}")
ATCG != GCTA
To avoid confusion, stick with lowercase for most variable names.
For multi-word variable names, Python programmers typically use snake_case (lowercase words separated by underscores):
# Good -- snake case
= 150
read_length = 1000
sequence_count = True
is_high_quality
# Avoid - camelCase or PascalCase
= 150
readLength = 1000 SequenceCount
Here are some best practices for naming variables in your code:
Use descriptive names that explain the variable’s purpose:
# Clear and descriptive
= 1000
sequence_length = 30
quality_threshold
# Too vague
= 1000
x = 30 threshold
Use nouns for variables that hold values:
= 500
read_count = "ATCG" dna_sequence
Boolean variables often start with is_
, has_
, or similar:
= True
is_paired_end = False has_adapter
Collections (which we’ll cover later) often use plural names:
= ["ATCG", "GCTA"]
sequences = [30, 35, 40] quality_scores
Common exceptions where short names are okay:
i
, j
, k
for loop indicesx
, y
, z
for coordinatesmsg
for message, num
for numberKeep names reasonably short while still being clear:
# Too long
= 100
number_of_sequences_passing_quality_filter # Better
= 100 passing_sequences
Remember: your code will be read more often than it’s written, both by others and by your future self. Clear variable names make your code easier to understand and maintain.
For more detailed naming guidelines, check Python’s PEP 8 Style Guide.
Python has many different types of data it can work with. Each data type has its own special properties and uses.
In this section, we’ll cover the basic data types you’ll use frequently in your code:
We’ll learn how to:
Understanding these fundamental data types is crucial for handling data correctly in your programs.
Python is a dynamically typed language, meaning a variable’s type can change during your program. While this flexibility is useful, it’s important to keep track of your data types to avoid errors in your analysis.
You can check a variable’s type using Python’s built-in type()
function. Here’s how:
= 150
sequence_length print(type(sequence_length)) # <class 'int'>
= "ATCGGCTAA"
sequence print(type(sequence)) # <class 'str'>
= True
is_valid print(type(is_valid)) # <class 'bool'>
<class 'int'>
<class 'str'>
<class 'bool'>
As shown above, type()
tells us exactly what kind of data we’re working with. This can be particularly helpful when debugging calculations that aren’t working as expected, or verifying data is in the correct format.
Don’t worry too much about the class
keyword in the output – we’ll cover classes in detail later. For now, focus on recognizing the basic types: int
for integers, str
for strings (text), and bool
for True/False values.
Python has two main types for handling numbers:
int
: Integers (whole numbers) for counting things like:
float
: Floating-point numbers (decimals) for measurements like:
For readability with large numbers, you can use underscores: 1_000_000
reads is clearer than 1000000
reads.
The operators +
, -
, *
, /
are used to perform the basic arithmetic operations.
= 1000
forward_reads = 800
reverse_reads print(forward_reads + reverse_reads)
print(forward_reads - reverse_reads)
print(forward_reads * 2)
print((forward_reads + reverse_reads) / 100)
1800
200
2000
18.0
Float division (/
) always returns a float, whereas integer division (//
) returns an int by performing floor division.
= 17
total_bases = 5
reads print(total_bases / reads)
print(total_bases // reads)
3.4
3
The operator **
is used for exponentiation.
print(2 ** 8)
print(8 ** 2)
256
64
Parentheses ()
can be used to group expressions and control the order of operations.
# Order of operations
print(2 + 3 * 4) # multiplication before addition
print( (2 + 3) * 4 ) # parentheses first
14
20
Modulo (%
) gives remainder of division
= 17
position = position % 3 # Which position in codon (0, 1, or 2)
codon_position print(codon_position)
2
Be careful about combining negative numbers with floor division or modulo. Here are some interesting examples showing how negative numbers behave with floor division and modulo in Python:
# Floor division with negative numbers
print("Floor division with negative numbers:")
# Rounds down to nearest integer
print(17 // 5)
# Rounds down, not toward zero
print(-17 // 5)
print(17 // -5)
print(-17 // -5)
# Modulo with negative numbers
print("\nModulo with negative numbers:")
print(17 % 5)
# Result is positive with positive divisor
print(-17 % 5)
# Result has same sign as divisor
print(17 % -5)
print(-17 % -5)
Floor division with negative numbers:
3
-4
-4
3
Modulo with negative numbers:
2
3
-3
-2
Don’t worry too much about the details of how negative numbers work with division and modulo operations. Just be aware that they can behave unexpectedly, and look up the specific rules if you need them.
Scientific notation is essential when working with very large or small numbers:
# 3.2 billion bases
= 3.2e9
genome_size
# 0.00000001 mutations per base
= 1e-8 mutation_rate
Python can handle arbitrarily large integers, limited only by memory:
= 125670495610435017239401723907559279347192756
big_number print(big_number)
125670495610435017239401723907559279347192756
Floating-point numbers have limited precision (about 15-17 decimal digits). This can affect calculations:
= 0.1
x = 0.2
y
# Might not be exactly 0.3
print(x + y)
0.30000000000000004
While these precision errors are usually small, they can accumulate in large-scale calculations.
Strings are how Python handles text data, like sequences or gene names.
# Strings can use single or double quotes
= 'ATCG'
sequence = "nrdA"
gene_name print(sequence)
print(gene_name)
ATCG
nrdA
Strings are immutable – once created, they cannot be modified. For example, you can’t change individual bases in a sequence directly:
= "ATCG"
dna # This would raise an error:
# dna[0] = "G"
Try uncommenting that line and see what happens!
You can combine strings using the +
operator:
# String concatenation
= "ATCG"
sequence_1 = "GCTA"
sequence_2 = sequence_1 + sequence_2
full_sequence print("the sequence is: " + full_sequence)
the sequence is: ATCGGCTA
Special characters can be included using escape sequences:
\n
for new line\t
for tab\\
for backslash# Formatting sequence output
print("Sequence 1:\tATCG\nSequence 2:\tGCTA")
Sequence 1: ATCG
Sequence 2: GCTA
F-strings (format strings) are particularly useful for creating formatted output. They allow you to embed variables and expressions in strings using {expression}
:
= "nrdJ"
gene_id = 37_531
position
print(f"Gene {gene_id} is located at position {position}")
Gene nrdJ is located at position 37531
F-strings can also format numbers, which is useful for scientific notation and precision control:
# Two decimal places
= 0.42857142857
gc_content print(f"GC content: {gc_content:.2f}")
# Scientific notation
= 0.000000342
p_value print(f"P-value: {p_value:.2e}")
GC content: 0.43
P-value: 3.42e-07
Strings can contain Unicode characters:
# Unicode characters
print("你好")
print("こんにちは")
你好
こんにちは
While Python supports Unicode characters in variable names, it’s better to use standard ASCII characters for code:
# Possible, but not recommended
= 0.05
α = 0.20
β
# Better
= 0.05
alpha = 0.20 beta
String operations are fundamental for processing and manipulating textual data, formatting output, and cleaning up input in your applications and analysis pipelines.
+
The +
operator joins strings together:
# Joining DNA sequences
= "ATCG"
sequence1 = "GCTA"
sequence2 = sequence1 + sequence2
combined_sequence print(combined_sequence)
# Adding labels to sequences
= "nrdA"
gene_id = gene_id + ": " + combined_sequence
labeled_sequence print(labeled_sequence)
ATCGGCTA
nrdA: ATCGGCTA
*
The *
operator repeats a string a specified number of times:
# Repeating DNA motifs
= "AT"
motif = motif * 3
repeat print(repeat)
# Creating alignment gap markers
= "-" * 6
gap print(gap)
ATATAT
------
Python uses zero-based indexing to access individual characters in a string. You can also use negative indices to count from the end:
# Indexing
= "Hello, world!"
s print(s[0])
print(s[7])
print(s[-1])
print(s[-8])
H
w
!
,
Slicing lets you extract parts of a string using the format [start:end]
. The end
index is exclusive:
# Slicing
= "Hello, World!"
s print(s[0:5])
print(s[7:])
print(s[:5])
print(s[-6:])
print(s[-12:-8])
Hello
World!
Hello
World!
ello
Python strings have built-in methods for common operations. Here are a few common ones:
# Clean up sequence data with leading/trailing white space
= " ATCG GCTA "
raw_sequence = raw_sequence.strip()
clean_sequence print("|" + raw_sequence + "|")
print("|" + clean_sequence + "|")
# Convert between upper and lower case
= "AtCg"
mixed_sequence print(mixed_sequence.upper())
print(mixed_sequence.lower())
# Chaining methods
= " AtCg "
messy_sequence = messy_sequence.strip().upper()
clean_upper print("|" + clean_upper + "|")
| ATCG GCTA |
|ATCG GCTA|
ATCG
atcg
|ATCG|
Boolean values represent binary states (True/False) and are used to make decisions in code:
True
represents a condition being metFalse
represents a condition not being met(Note: These are capitalized keywords in Python!)
Boolean variables often use prefixes like is_
, has_
, or contains_
to clearly indicate their purpose:
= True
is_paired_end = False
has_adapter = True contains_start_codon
Boolean values are used in control flow – they drive decision-making in your code:
= True
is_high_quality if is_high_quality:
print("Sequence passes quality check!")
= False
has_ambiguous_bases if has_ambiguous_bases:
# This won't execute because condition is False
print("Warning: Sequence contains N's")
Sequence passes quality check!
Boolean values are created through comparisons, for example:
# Quality score checking
= 35
quality_score print(quality_score > 30)
print(quality_score < 20)
print(quality_score == 40)
print(quality_score != 35)
True
False
False
False
Logical operators (and
, or
, not
) combine boolean values:
# Logical operations
print(True and False)
print(True or False)
print(not True)
print(not False)
False
True
False
True
For example, you could use logical operators to combine multiple logical statements:
is_long_enough and is_high_quality
is_exempt or exceeds_threshold
Comparison operators are used to compare “compare” values. They return a boolean value (True
or False
) and are often used in conditional statements and loops to control program flow.
The basic comparison operators are:
==
: equal to!=
: not equal to<
: strictly less than<=
: less than or equal to>
: strictly greater than>=
: greater than or equal toAdditional operators we’ll cover later:
is
, is not
: object identityin
, not in
: sequence membershipHere are a couple examples:
# Basic boolean values
= True
is_sunny = False
is_raining
print(f"Is it sunny? {is_sunny}")
print(f"Is it raining? {is_raining}")
# Comparison operations produce boolean results
= 25
temperature = temperature > 30
is_hot print(f"Is it hot? {is_hot}")
# Logical operations
= is_sunny and not is_raining
is_good_weather print(f"Is it good weather? {is_good_weather}")
Is it sunny? True
Is it raining? False
Is it hot? False
Is it good weather? True
# Comparison operations
print(5 == 5)
print(5 != 5)
print(5 < 3)
print(5 <= 3)
print(5 <= 5)
print(5 > 3)
print(5 >= 3)
print(5 >= 5)
True
False
False
False
True
True
True
True
Comparisons can be chained together, e.g. 1 < 2 < 3
is equivalent to 1 < 2 and 2 < 3
.
# Chained comparisons
print(1 < 2 < 3)
print(1 < 2 < 2)
print(1 < 2 <= 2)
# This one is a bit weird, but it's valid Python!
print(1 < 2 > 2)
True
False
True
False
The comparisons operators can also be used to compare the values of variables.
# Check if value is in valid range
= 30
coverage print(10 < coverage < 50)
= 35
quality_score print(20 < quality_score <= 40)
# Multiple range checks
= 37.2
temperature print(37.0 <= temperature <= 37.5)
True
True
True
Python’s comparison operators work beyond just numbers, allowing comparisons between various types of data. Be careful though – while some comparisons make intuitive sense, others might require careful consideration or custom implementation.
# Comparison of different types
print("Hello" == "Hello")
print("Hello" == "World")
print("Hello" == 5)
print("Hello" == True)
# Some non-numeric types also have a natural ordering.
print("a" < "b")
print("a" < "A")
# This is a bit weird, but it's valid Python!
print([1, 2, 3] <= [10, 20, 30])
True
False
False
False
True
False
True
Think of logical operators as ways to combine or modify simple yes/no conditions in your code, much like how you might combine criteria when filtering data in Excel or selecting samples for an experiment.
For example, you can use logical operators to express conditions like:
These operators (and
, or
, not
) work similarly to the way we combine conditions in everyday language. Just as you might say “I’ll go for a run if it’s not raining AND the temperature is above 60°F,” you can write code that makes decisions based on multiple criteria.
Here are a couple of examples:
# In a sequence quality filtering pipeline
#
# Both conditions must be true
if sequence_length >= 250 and quality_score >= 30:
keep(sequence)
# In a variant calling pipeline
#
# Either condition being true is sufficient
if mutation_frequency > 0.01 or supporting_reads >= 100:
report(variant)
# In a data validation step
#
# Triggers if the condition is false
if not sample_id.startswith('PROJ_'):
warn_user(sample_id)
Think of these operators as the digital equivalent of the decision-making process you use in the lab: checking multiple criteria before proceeding with an experiment, or having alternative procedures based on different conditions.
Let’s explore how Python’s logical operators (and
, or
, not
) work, using examples relevant to biological data analysis.
Think of these operators as ways to check multiple conditions, similar to how you might design experimental criteria:
and
: Like requiring ALL criteria to be met (e.g., both proper staining AND correct cell count)or
: Like accepting ANY of several criteria (e.g., either elevated temperature OR positive test result)not
: Like reversing a condition (e.g., NOT contaminated)Here’s a truth table showing all possible combinations.
A | B | A and B | A or B | not A |
---|---|---|---|---|
True | True | True | True | False |
True | False | False | True | False |
False | True | False | True | True |
False | False | False | False | True |
Here are the rules:
and
only gives True if both conditions are True (like requiring all quality checks to pass)or
gives True if at least one condition is True (like having multiple acceptable criteria)not
flips True to False and vice versa (like converting “passed QC” to “failed QC”)Interestingly, Python can also evaluate non-boolean values (values that aren’t strictly True or False) using these operators. We call values that Python treats as True “truthy” and values it treats as False “falsy”. This becomes important when working with different types of data in your programs and analysis pipelines.
In Python, every value can be interpreted as either “true-like” (truthy) or “false-like” (falsy) when used in logical operations. This is similar to how in biology, we might categorize results as “positive” or “negative” even when the underlying data is more complex than a simple yes/no.
Think of “falsy” values as representing empty, zero, or null states – essentially, the absence of meaningful data. Python considers the following values as “falsy”:
False
: The boolean False valueNone
: Python’s way of representing “nothing” or “no value” (like a blank entry in a spreadsheet)0
, 0.0
)""
)[]
)set()
){}
)Everything else is considered “truthy” - meaning it represents the presence of some meaningful value or data.
Let’s look at some practical examples. We can use Python’s bool()
function to explicitly check whether Python considers a value truthy or falsy:
# Examples from sample processing:
= 0
sample_count # False (no samples)
print(bool(sample_count))
= []
sample_ids # False (empty list of IDs)
print(bool(sample_ids))
= {}
patient_data # False (empty data table)
print(bool(patient_data))
# Compare with:
= 5
sample_count # True (we have samples)
print(bool(sample_count))
= ["A1", "B2"]
sample_ids # True (we have some IDs)
print(bool(sample_ids))
= {"age": 45}
patient_data # True (we have some data)
print(bool(patient_data))
False
False
False
True
True
True
Understanding truthy and falsy values becomes particularly useful when writing conditions in your code, like checking whether you have data before proceeding with analysis:
# Sort of like saying: if there are some samples IDs,
# then do something with them.
if sample_ids:
process_samples(sample_ids)else:
print("No samples to process")
We’ll see more examples of how this concept is useful in practice as we work through more advanced topics.
and
and or
Note: This section is a bit low-level, so don’t worry too much about it. It’s just here for your reference.
One kind of neat thing about the logical operators is that you can directly use them as a type of control flow.
and
Given an expression a and b
, the following steps are taken:
a
.a
is “falsy”, then return the value of a
.b
and return its value.Check it out:
= "apple"
a = "banana"
b = a and b
result print(result)
= "Maya"
name = 45
age = age >= 18 and f"{name} is an adult"
result print(result)
= "Amira"
name = 15
age = age >= 18 and f"{name} is an adult"
result print(result)
banana
Maya is an adult
False
Were the values assigned to result
what you expected?
or
Given an expression a or b
, the following steps are taken:
a
.a
is “truthy”, then return the value of a
.b
and return its value.Let’s return to the previous example, but this time we will use or
instead of and
.
= "apple"
a = "banana"
b = a or b
result print(result)
= "Maya"
name = 45
age # Observe that this code isn't really doing what we want it to do.
# `result` will be True, rather than "Maya is an adult".
# That's because it should be using `and`
# ...again, it's just for illustration.
= age >= 18 or f"{name} is an adult"
result print(result)
= "Amira"
name = 15
age # This code is a bit obscure, and you probably wouldn't
# write it like this in practice. But it illustrates the
# point.
= age >= 18 or f"{name} is not an adult"
result print(result)
apple
True
Amira is not an adult
Were the values assigned to result
what you expected?
Think of control flow as the decision-making logic in your code - like following a lab protocol, but for data analysis. Just as you make decisions in the lab (“if the pH is too high, add buffer”), your code needs to make decisions about how to handle different situations.
Control flow statements are the programming equivalent of those decision points in your protocols. They let your program take different paths depending on the conditions it encounters, much like how you might follow different steps in an experiment based on your observations.
In this section, we’ll cover several ways to build these decision points into your code:
if
statements (like “if the sequence quality is low, skip it”)if-else
statements (like “if the gene is expressed, mark it as active; otherwise, mark it as inactive”)if-elif-else
chains (for handling multiple possibilities, like different ranges of p-values)Control flow is essential for writing programs that can:
Just as following the right branch points in a protocol is crucial for experimental success, proper control flow is key to writing programs that correctly handle your data.
Let’s explore the main types of control flow in Python:
if
StatementsThink of these as your basic yes/no checkpoints, like checking if a sample meets quality control:
= 35
quality_score if quality_score > 30:
print("Sample passes QC")
Sample passes QC
if-else
StatementsThese handle two alternative outcomes, like categorizing genes as expressed or not expressed:
= 1.5
expression_level if expression_level > 1.0:
print("Gene is upregulated")
else:
print("Gene is not upregulated")
Gene is upregulated
if-elif-else
ChainsPerfect for handling multiple possibilities, like categorizing p-values or expression levels:
= 0.03
p_value if p_value < 0.01:
print("Highly significant")
elif p_value < 0.05:
print("Significant")
else:
print("Not significant")
Significant
Sometimes you need to check multiple criteria, like filtering sequencing data:
= 100
read_length = 0.45
gc_content = 35
quality_score
if read_length >= 100 and quality_score > 30 and 0.4 <= gc_content <= 0.6:
print("Read passes all quality filters")
else:
print("Read filtered out")
Read passes all quality filters
Think of control flow as building decision points into your data analysis pipeline. Just as you wouldn’t proceed with a PCR if your DNA quality was poor, your code can automatically make similar decisions about data processing.
Conditional statements can also be nested. Here is some code that is checking if someone can go to the beach. If they are not at work, and the weather is sunny, then they can go to the beach.
= False
at_work = "sunny"
weather
if weather == "sunny" and not at_work:
print("It's sunny and you are not at work, let's go to the beach!")
else:
print("We can't go to the beach today for some reason.")
# Let's move the check for at_work nested inside the if statement that checks
# the weather.
#
# Note that this code isn't equivalent to the previous code, just an example
# of nesting.
if weather == "sunny":
if at_work:
print("You are at work and can't go to the beach.")
else:
print("It's sunny and you are not at work, let's go to the beach!")
else:
print("It's not sunny, so we can't go to the beach regardless.")
# Just to be clear, let's "unnest" that conditional.
if weather == "sunny" and at_work:
print("You are at work and can't go to the beach.")
elif weather == "sunny":
print("It's sunny and you are not at work, let's go to the beach!")
else:
print("It's not sunny, so we can't go to the beach regardless.")
It's sunny and you are not at work, let's go to the beach!
It's sunny and you are not at work, let's go to the beach!
It's sunny and you are not at work, let's go to the beach!
Just as you want to keep your experimental protocols clear and straightforward, the same principle applies to writing conditional statements in your code. Think of deeply nested if-statements like trying to follow a complicated diagnostic flowchart - the more branches and decision points you add, the easier it is to lose track of where you are.
For example, imagine designing a PCR troubleshooting guide where each problem leads to three more questions, each with their own set of follow-up questions. While technically complete, it would be challenging for anyone to follow correctly. The same goes for code – when we stack too many decisions inside other decisions, we’re setting ourselves up for confusion.
Here’s why keeping conditions simple matters:
When you find yourself writing deeply nested conditions, it’s often a sign to step back and consider whether there’s a clearer way to structure your code.
Think of Python’s built-in functions as your basic laboratory toolkit - they’re always there when you need them, no special setup required. These functions will become your go-to tools for handling biological data, from DNA sequences to experimental measurements.
Here are some of the most useful built-in functions you’ll use regularly:
print()
: Displays your data or resultslen()
: Counts the length of somethingabs()
: Gives you the absolute valueround()
: Tidies up decimal numbersmin()
and max()
: Find the lowest and highest valuessum()
: Adds up a collection of numberstype()
: Tells you what kind of data you’re working with (helpful for debugging)Let’s look at some examples:
# Printing experimental results
print("Gene expression analysis complete!")
# Checking sequence length
= "ATCGATCGTAGCTAGCTAG"
dna_sequence = len(dna_sequence)
length print(f"This DNA sequence is {length} base pairs long.")
# Working with expression fold changes
= -2.5
fold_change = abs(fold_change)
absolute_change print(f"The absolute fold change is {absolute_change}x.")
# Cleaning up p-values
= 0.0000234567
p_value = round(p_value, 6)
rounded_p print(f"p-value = {rounded_p}")
# Analyzing multiple expression values
= [10.2, 5.7, 8.9, 12.3, 6.8]
expression_levels = min(expression_levels)
lowest = max(expression_levels)
highest print(f"Expression range: {lowest} to {highest}")
# Calculating average coverage
= [15, 22, 18, 20, 17]
coverage_values = sum(coverage_values) / len(coverage_values)
average_coverage print(f"Average sequencing coverage: {average_coverage}x")
# Checking data types
= "nrdA"
gene_name = type(gene_name)
data_type print(f"The variable gene_name is of type: {data_type}")
Gene expression analysis complete!
This DNA sequence is 19 base pairs long.
The absolute fold change is 2.5x.
p-value = 2.3e-05
Expression range: 5.7 to 12.3
Average sequencing coverage: 18.4x
The variable gene_name is of type: <class 'str'>
To use these functions, just type the function name followed by parentheses containing your data (the “arguments”). Some functions, like min()
and max()
, can handle multiple inputs, which is handy when comparing several values at once.
In this tutorial, we covered the fundamental building blocks of Python programming that you’ll use throughout your bioinformatics work:
Remember:
These basics form the foundation for more advanced programming concepts we’ll explore in future tutorials. Practice working with these fundamentals – they’re the tools you’ll use to build more complex bioinformatics applications.
Don’t worry if everything hasn’t clicked yet. Programming is a skill that develops with practice. Focus on understanding one concept at a time, and remember that you can always refer back to this tutorial as a reference.
Next up, we’ll build on these basics to work with more complex data structures and write functions of our own!