1. Variables#

First, we’ll go over features of the basic Python language, before getting into tools for scientific computing and data science. We will first focus on how data can be stored in different kinds of Python variables.

1.1. Numbers and Math#

Python can be used like a calculator, performing standard mathematical operations.

3 + 5 * 4
23

We can store variables using a name and an equal sign.

weight_kg = 80.3
height_m = 1.9

Named variables can then be used to perform calculations.

weight_kg / height_m ** 2
22.24376731301939

Variable names can be almost anything, but they cannot start with a number (for example, 4vars is invalid as a variable name).

There’s an art to naming variables to make them informative but not too hard to type. In Python, the usual style is to separate words with underscores (_).

wkg = 80.3  # hard to understand unless you already know what it stands for
weight_kg = 80.3  # clearer, though it will take longer to type
WeightKG = 80.3  # not bad, but goes against standard Python style guidelines

In Jupyter, we can display the value of a variable just by writing its name at the end of a cell and running the cell.

weight_kg
80.3

The print function is a more flexible way to display variables. The output sometimes looks a little different between these methods, but they both show the value of the variable.

print(weight_kg)
80.3

If we use print, we can display multiple variables within the same cell.

a = 1
b = 2
c = 3
print(a)
print(b)
print(c)
1
2
3

Exercise: numbers and math#

Create a variable called memory_score1 and assign it the value 9.2. Create another variable called memory_score2 and assign it the value 8.4. Use these variables to calculate a mean_score variable by adding the two variables and dividing by two.

Type the name of the mean_score variable by itself at the end of your code cell to display the result. Then try using the print function to display mean_score.

# answer here

1.2. Using functions#

In Python (and other programming languages), functions are used to run common operations.

Functions take some sort of input and produce some sort of output. We can run them like so:

output1, output2, ... = function_name(input1, input2, ...)

For example, the print function does not have any standard outputs, but just displays the value of a variable.

print(a)
1

print can take multiple inputs, which will be displayed with a space separating them by default.

print(a, b, c)
1 2 3

To call a function, we type the name of the function, an open paren ((), a list of inputs separated by commas (,), and then a close paren ()).

There are many built-in functions in Python. For example, we can round numbers using the round function.

height_precise = 5.912
round(height_precise)
6

The round function takes an optional second input that indicates how many decimal places we want to round to.

height_rounded = round(height_precise, 2)
height_rounded
5.91

Exercise: using functions#

Say that a participant completed nine trials of a memory test and answered correctly on three of those trials. Create a variable called number_correct with a value of 3 and a variable called number_trials with a value of 9. Calculate the overall accuracy by dividing the number correct by the number of trials, and assign the result to a variable called accuracy.

Use the round function to round accuracy to three decimal places, and use the print function to display the result.

# answer here

1.3. Data types#

There are different types of data that we can store in variables, including:

  • Strings (text)

  • Integer numbers

  • Floating-point numbers (decimals)

  • Booleans (True or False)

name = "John Doe"
age = 23
height = 5.9
study_complete = True

To store text data, we can use a string. Strings must always be surrounded by either double quotes ("") or single quotes ('').

participant_id = "sub-001"  # assigns variable to: sub-001
participant_id = 'sub-001'  # does the same thing
#participant_id = sub-001   # does not work; need quotes around the text

Depending on how a number is written, we will get either a float (used for representing decimals) or an int (used for representing integers). After making a variable, we can check the data type using the type function.

weight_kg = 60.3
weight_kg_rounded = 60

# get type of variables
type_kg = type(weight_kg)
type_kg_rounded = type(weight_kg_rounded)
print(type_kg)
print(type_kg_rounded)
<class 'float'>
<class 'int'>

Exercise: data types#

Make variables to represent a participant’s identifier, sub-001, their accuracy on a test where they got 16 out of 24 items correct, their age in years (23), and whether they completed the full study (True).

Display your variables using print. Also check the types of your variable, which should be str, float, int, and bool. Round the accuracy score to two decimal places.

# answer here

1.4. F-strings#

Sometimes, to display information in a convenient format, it’s helpful to construct a string from individual variables. We can use a feature called f-strings to do this.

greeting = "Hello"
user = "Mary"
message = f"{greeting} {user}"
print(message)
Hello Mary

You make an f-string by adding an “f” before the quotes that make up a string. Variable names can be placed in curly braces ({}) to place their contents at that point in the string.

There are a lot of options for formatting variables as strings, making this method very flexible.

For example, if we write curly braces with the name of a variable immediately followed by =, the f-string will show the name of the variable along with its value.

name = "John Doe"
height = 5.9
age = 23
print(f"Patient {name}: {height=}, {age=}")
Patient John Doe: height=5.9, age=23

F-strings can optionally include formatting specifiers to indicate how variables should be formatted as text.

For example, we can control the number of decimal places to show or add zero-padding to the start of formatted text.

score = 1 / 3
number = 4
print(f"{score}")       # default text from a decimal number
print(f"{score:.2f}")   # round to two decimals
print(f"{number:03d}")  # add zeros to pad to three characters
0.3333333333333333
0.33
004

See the Python tutorial for more information about formatting specifiers.

Exercise: f-strings#

Say participant 3 in a study, who is 28 years old, recalled four out of ten words. Create variables to store the participant identifier, their age, and their recall accuracy. Use an f-string with your variables to create this string to summarize the participant’s information: 3 (age=28): 0.4 accuracy.

Advanced#

Use a formatting specifier to instead display the participant identifier as a zero-padded string with four characters.

# answer here

1.5. Lists#

A common way to organize data is to place elements into a list. We can put any kind of data into a list.

participant_ids = ["001", "002", "003"]
mixed_data = ["John Doe", 5.9, 23]

To get the length of a list, use the len function.

len(participant_ids)
3

After creating a list, we can add elements to the end of it using append.

participant_ids.append("004")
print(participant_ids)
['001', '002', '003', '004']

We just used a special type of function called a method. Methods work like functions, but are attached to objects. Everything in Python is an object, and objects have attributes that are accessed using a . Some of these attributes, like append, are methods that we can call to work with an object.

Use the help function with any Python object to see what methods are associated with that object.

help(participant_ids)

Lists can be joined together using +.

l1 = [1, 2, 3]
l2 = [4, 5, 6]
print(l1 + l2)
[1, 2, 3, 4, 5, 6]

Note that the + is used a different way with lists than it is with numbers. This is called operator overloading. If you try to mix numbers and lists, you’ll get an error.

# invalid (try it!): 4 + [1, 2]

After we create a list, we can access elements of that list using indexing. Indexing lets us access data based on its position in the list.

A slightly confusing thing about Python (and many programming languages) is that the first index is 0, and it counts up from there. So, the first element is at 0, the second element is at 1, etc.

print(participant_ids)
['001', '002', '003', '004']
print(participant_ids[0])
001
print(participant_ids[1])
002

Indexing can be used to change elements of the list. We do this by assigning part of the list to something new.

conditions = ["1a", "1b", "2a", "2b"]
print(conditions)
['1a', '1b', '2a', '2b']
conditions[0] = "1c"
print(conditions)
['1c', '1b', '2a', '2b']

Exercise: making lists#

Make a list of memory scores for different participants: 8, 3, 14, 10. Using indexing to get the score for the third participant in the list.

Add one more participant’s score (15) to the list using the append method.

Change the second participant’s score to 9.

# answer here

1.6. Slicing#

Besides accessing lists one element at a time, we can also access a range of elements (called a slice) using the colon operator (:).

Slices usually just have a start index and a finish index, like this:

my_list[start:finish]

The finish is non-inclusive. For example, [0:2] will get the first two elements.

print(participant_ids[0:2])
['001', '002']

This picture summarizes how things work:

list:  [  1,  2,  3,  4,  5  ]
index: [  0   1   2   3   4  ]
slice: [0   1   2   3   4   5]

The index row shows the index to get individual elements of the list. For example, if we access index 2, we’ll get 3.

l = [1, 2, 3, 4, 5]
print(l[2])
3

The slice row shows how slicing works. You can think of the slice indices as being sort of between the elements of the list. When you slice a list, you’ll get back everything between the slice indices.

print(l[0:3])
print(l[3:5])
[1, 2, 3]
[4, 5]

If either the start or finish of a slice is omitted, the corresponding end of the list will be used.

print(participant_ids[:2])  # from the start until 2
['001', '002']
print(participant_ids[1:])  # from 1 until the end
['002', '003', '004']

The way indexing and slicing work might seem weird, but it has some nice properties. For example, if we have a list of 4 elements:

new_list = [1, 2, 4, 8]

We can split it in half in an intuitive way, where the index 2 marks the center of the list:

print(new_list[:2])  # first half
print(new_list[2:])  # second half
[1, 2]
[4, 8]

Negative indices#

In addition to the usual indices, which are non-negative integers (0, 1, 2, etc.), we can use negative indices instead, which count from the end of the list instead of the start.

list:  [   1,  2,  3,  4,  5   ]
index: [  -5  -4  -3  -2  -1   ]
slice: [-5  -4  -3  -2  -1   0 ]
l = [1, 2, 3, 4, 5]
print(l[-1])
print(l[-5:-3])
5
[1, 2]

Lists of lists#

Lists can hold any type of data. This means we can even make lists of lists.

participant_groups = [["001", "003", "005"], ["002", "004", "006"]]

We can then put together multiple indices. For example, to get the third entry in the second group:

print(participant_groups[1][2])
006

Slicing and indexing work the same way no matter what type of data we have in the list, so we can use it for lists of lists too.

print(participant_groups[0])
print(participant_groups[0][1:])
['001', '003', '005']
['003', '005']

Exercise: lists, indexing, and slicing#

Create a list of words (for a hypothetical memory experiment) in this order: hollow, pupil, fountain, piano.

Then using indexing and slicing to:

  • Get the second word in the list

  • Get the last item in the list using negative indices

  • Get the first three items in the list using slicing

  • Get the last two items using slicing with negative indices

Advanced#

Slicing can take another form with three numbers, to indicate the spacing to use. This form is written like start:finish:step. Use this method to get every other item in the list.

# answer here

1.7. Tuples and strings#

Tuples are similar to lists, in that they store data by position, but once they are created they cannot be changed. Data structures that cannot be changed are called immutable.

Tuples are often used as inputs and outputs to functions. More on that later.

The syntax for creating tuples is similar to lists, but using parentheses instead of square braces.

my_tuple = (1, 2, 3)
other_tuple = ('a', 'b', 'c')
print(my_tuple)
print(other_tuple)
(1, 2, 3)
('a', 'b', 'c')

Tuples don’t actually require the parentheses; just the commas are sufficient to indicate a tuple.

new_tuple = 4, 5, 6

You can have a tuple with only one element, if you use a trailing comma.

one_item_tuple = 7,

Otherwise, this would just be a 7.

Tuples are indexed and sliced in the same way as lists.

numbers = (1, 2, 3)
print(numbers[1])
print(numbers[1:])
2
(2, 3)

Unlike lists, tuples cannot be changed once they are created.

We saw before that we can store text data in strings. Strings are a lot like lists and tuples, because they are all considered iterables.

message = "hello world"
print(len(message))  # len works the same as for lists
print(message[:5])  # slicing can get used to get subsets of strings
11
hello

Like lists, strings can be concatenated.

word1 = "hello"
word2 = "world"
print(word1 + " " + word2)
hello world

Exercise: tuples and strings#

Make a string by contenating two strings together. The first part of the string will be "sub-", and the second part will be "001". After creating the larger string, place it into a tuple.

Advanced#

After making your tuple, try adding another element to it. What happens?

# answer here

1.8. Dictionaries#

Dictionaries (or dicts, for short) contain mappings. Lists store data by position in the list, while dicts store data in terms of keys. This is often useful to label data with strings.

Dicts are specified using curly braces ({}), with key-value pairs separated by colons, like

{key1: value1, key2: value2}

For example, we could have a dict that stores data for each of a set of participants.

age = {"sub-001": 23, "sub-002": 29}
print(age)
{'sub-001': 23, 'sub-002': 29}

We can index data using the key to get the corresponding value.

print(age["sub-002"])
29

Dictionaries can contain any mapping between keys and values, as long as the keys are immutable. This means you can’t use things like lists as keys, because lists can be changed.

print({1: "A", 2: "B"})
print({1.1: "A.A", 2.2: "B.B"})
# not valid (try it!):  {[1, 2, 3]: ["A", "B", "C"]}
{1: 'A', 2: 'B'}
{1.1: 'A.A', 2.2: 'B.B'}

Values can be anything, including dicts, so you can make dicts of dicts.

holidays = {
    "new_years": {"month": "Jan", "day": 1},
    "mlk_day": {"month": "Jan", "day": 15},
}

Note that we can split dictionary statements over multiple lines for readibility.

After creating a dict, you can add more keys using an assignment statement.

age = {"sub-001": 23, "sub-002": 29}
age["sub-003"] = 25
print(age)
{'sub-001': 23, 'sub-002': 29, 'sub-003': 25}

Exercise: dictionaries#

Create a dict of participant memory scores. The participant identifiers were 01, 02, and 03, and they remembered 9, 13, and 10 words each, respectively.

After making this dict, get the memory score for participant 03.

Advanced#

Use a dictionary of dictionaries to store multiples pieces of information for multiple participants. Participant 01 was age 23 and had a memory score of 9. Participant 02 was age 29 and had a memory score of 13.

# answer here

1.9. Summary#

We can assign simple variables to store numeric and text data.

height_m = 1.9
participant_id = "sub-002"

Lists and tuples can be used to store sequences of data points.

data = [1.9, "O+", 29]
sequence = (1, 2, 4, 8, 16)

Dicts can be used to store data in key, value pairs.

scores = {"sub-001": 9, "sub-002": 11, "sub-003": 8}