Appendix A — Basics: Practice Problems

Author

Ryan M. Moore, PhD

Published

May 23, 2025

Modified

May 28, 2025

These are the practice problems for Chapter 1. For solutions, see Appendix B.

A.1 Assigning Variables and Printing

Task Description

Create a variable to store the name of a bacterial species (e.g., "Escherichia coli"). Assign another variable for the number of base pairs in its genome (e.g., 4_600_000). Print a statement describing the species and its genome size.

Learning Objectives

Assign values to variables
Print formatted output
Use underscores in large numbers for readability

Solution

# Write your code here!

Test Cases

# Should print:
# The species Escherichia coli has a genome size of 4600000 base pairs.

Common Issues

Forgetting to enclose text in quotes
Forgetting the f in the f-strings
Not matching variable names in the f-string

Optional Extensions

Try with a different species and genome size.
Try without using f-strings

A.2 Calculating and Formatting GC Content

Task Description

Given variables gc_count (number of G or C bases) and total_bases (total number of bases in the genome), calculate the GC content as a decimal value and print it to two decimal places.

Learning Objectives

Perform basic arithmetic operations
Calculate percentages or ratios
Use f-strings with formatting

Solution

# Write your code here!

Test Cases

# Should print:
# GC content: 0.28

Common Issues

Using integer division rather than float division (e.g., should use /, not //)
Forgetting to format decimal places

Optional Extensions

Multiply by 100 to display as a percentage.

A.3 String Slicing and Concatenation

Task Description

Create a variable sequence to hold the DNA sequence string "ACTGGTCAA".
Then use string slicing (e.g., s[start:end]) to create two more variables, one to hold the first four bases, and the other to hold the remaining five bases.
Concatenate the two parts back together
Print everything to check your work

Learning Objectives

Create string variables and concatenate strings with +
Use string slicing [start:end] notation
Understand zero-based indexing; extract parts of a string

Solution

# Write your code here!

Test Cases

# Should output something like:
# sequence='ACTGGTCAA'
# first_four='ACTG'; last_five='GTCAA'
# combined='ACTGGTCAA'

Common Issues

Off-by-one errors in slicing
Incorrect use of negative indices

Optional Extensions

Use f-string formatting to combine the first and last parts of the sequence
Try the slice notation where you don’t specify both the start and the end
Extract the last 3 bases using negative indices

A.4 Working with Booleans and Conditional Execution

Task Description

You have a variable quality_score. Write code that prints "Pass" if quality_score is greater than or equal to 30, and "Fail" otherwise.

Then, given two boolean variables, is_long_enough and is_high_quality, print "Accepted" if both are True, otherwise "Rejected".

Learning Objectives

Use boolean comparisons and conditional statements
Work with boolean variables and logical operators and

Solution

# Write your code here!

Test Cases

# quality_score = 28 -> Fail
# quality_score = 32 -> Pass
# is_long_enough = True, is_high_quality = False -> Rejected
# is_long_enough = True, is_high_quality = True -> Accepted

Common Issues

Using > instead of >= for the test
Using or instead of and in the boolean expression

Optional Extensions

Add a message showing the quality score if it is under 30
Show the user the reason for rejection

A.5 Using Built-in Functions and List Operations

Task Description

You are given the following gene expression values: 2.1, 3.4, 1.8, 6.2, and 4.0.

Create a list of gene expression values and print the minimum, maximum, and average.
Then, get the first item in the list and print its type.

Learning Objectives

Use built-in functions: min(), max(), len(), sum(), type()
Work with lists

Solution

# Write your code here!

Test Cases

# Output:
# min: 1.8
# max: 6.2
# mean: 3.5
# type of first item: <class 'float'>

Common Issues

Calling the functions incorrectly
Using the name of the function as your variable name

Optional Extensions

Show the average to two decimal places

A.6 Truthy/Falsy and Checking Emptiness

Task Description

Create an empty list called sequences. Write code to check if the list is empty. If it is, print "No sequences found!", otherwise print "Sequences loaded!"

Then, print the boolean value (True/False) of several sample values: empty string "", "AGTC", 0, 3.14, empty list, and a non-empty list.

Learning Objectives

Understand truthy/falsy values in Python
Check for emptiness using if statements
Use the bool() function

Solution

# Write your code here!

Test Cases

# For sequences = [], should print: No sequences found!
# For bool() values: False, True, False, True, False, True

Common Issues

Not understanding which values are Truthy/Falsy in Python

Optional Extensions

Print the number of loaded sequences if not empty

A.7 Avoiding Built-in Name Shadowing

Task Description

Assign the DNA sequence "ACTG" to a variable called str, and print its length. Then use the str() function to convert the float 3.14 to a string. What happens?

Learning Objectives

Use appropriate variable names
Avoid overwriting built-in functions

Solution

# Write your code here!

Test Cases

# Output:
# 4
# TypeError: 'str' object is not callable

Common Issues

Accidentally shadowing built-ins resulting in confusing errors

Optional Extensions

Try the same thing with other built-ins like list, len

A.8 Filtering Based on Multiple Criteria

Task Description

Given a sequence with the variables read_length, gc_content, and quality_score, print "Read passes all quality filters" if:

The read length is at least 100
The GC Content is no less than 0.4 and no more than 0.6
The quality score is greater than 30

Otherwise, print "Read filtered out".

Learning Objectives

Use multiple conditionals with and
Check ranges and comparison

Solution

# Write your code here!

Test Cases

# All conditions met => "Read passes all quality filters"
# At least one condition not met (e.g., quality score = 25) => "Read filtered out"

Common Issues

Using or instead of and
Messing up the boundary values
Problems turning “plain language” to boolean conditions

Optional Extensions

Explain which filter was not passed

A.9 Formatting Scientific Output

Task Description

Given a gene_id ("nrdA") and a p_value (0.000012345), print "Gene [gene_id] => [p_value]" formatting the p-value in scientific notation with two decimals.

Then print a report about its significance:

If the p-value is less than 0.01, print: Highly significant
If the p-value is less than 0.05, print: Significant
If the p-value is less than 0.10, print: Almost significant
If the p-value is greater than or equal to 0.10, print: Not significant

Learning Objectives

Format numbers as scientific notation
if/elif/else chains

Solution

# Write your code here!

Test Cases

# Output:
# Gene nrdA => 1.23e-05
# Highly significant

Common Issues

Forgetting formatting codes in f-strings
Incorrect boolean logic

Optional Extensions

Only print a single message that includes the gene ID, the p-value, and the significance message.

A.10 Sequence Analysis

Task Description

Given a ssDNA sequence string ("TGacTGatcGT"), first, convert the sequence to all uppercase letters, then analyze the DNA sequence by printing various information about it:

Sequence length
Count of nucleotides, A, C, G, and T
Count of ambiguous bases (N)
GC Content as a percentage
Calculate the molecular weight of the sequence

To calculate the molecular weight, use this formula from the Thermo Fisher website:

\(M.W. = (A_n * 313.2) + (T_n * 304.2) + (C_n * 289.2) + (G_n * 329.2) + 79.0\)

In the formula, \(A_n\) means the number of A nucleotides, \(T_n\) is the number of T nucleotides, and so on.

Note: Save each calculation to its own variable.

Learning Objectives

Perform multiple calculations to solve a single problem
Translate mathematical formulas into code
Perform basic sequence analysis

Solution

# Write your code here!

Test Cases

# Should print:
# DNA Sequence: TGACTGATCGT
# Length: 11
# Nucleotide counts
# A: 2, C: 2, G: 3, T: 4
# Ambiguous count: 0
# GC Content (%): 45.45454545454545
# Molecular weight: 3488.2

Common Issues

Forgetting to convert to uppercase first
Operator precedence in GC content calculation
Not implementing the molecular weight formula correctly

Optional Extensions

Make up some rules about what properties a “high quality” sequence should have, e.g., no more than 3 ambiguous bases, having a certain length or GC percentage, etc. Then check the sequence against those rules.

A.11 Hard: Nested Conditions for Sequence Filtering

Task Description

Given variables representing a DNA sequence’s length, quality score, and ambiguous base count, write nested conditional statements to print filtering messages according to:

If sequence length is greater than or equal to 200:
- If quality score is greater than or equal to 30:
  - If there is no more than one ambiguous base, print "Sequence accepted"
  - Else print "Sequence rejected: ambiguous bases present"
- Else print "Sequence rejected: low quality"
Else print "Sequence rejected: too short"

Learning Objectives

Practice nested if statements
Use logical reasoning to handle multiple conditions

Solution

# Write your code here!

Test Cases

# sequence_length = 250, quality_score = 32, ambiguous_bases = 0 → Sequence accepted
# ambiguous_bases = 2 → Sequence rejected: too many ambiguous bases
# quality_score = 28 → Sequence rejected: low quality
# sequence_length = 150 → Sequence rejected: too short

Common Issues

Missing or misplacing indentation
Not matching the correct else with the corresponding if

Optional Extensions

Can you re-write the solution so that it doesn’t use nested conditional statements?