Python variables#

First, we’ll go over features of the basic Python language, before getting into tools for scientific computing and data science. We will first focus on how data can be stored in different kinds of Python variables.

Variables and Math#

Python can be used like a calculator, with the ability to store multiple named variables.

3 + 5 * 4
23

We can store variables using a name and an equal sign.

weight_kg = 80.3
height_m = 1.9
weight_kg / height_m ** 2
22.24376731301939

Variable names can be almost anything, but they cannot start with a number (for example, 4vars is invalid as a variable name).

There’s an art to naming variables to make them informative but not too hard to type. In Python, the usual style is to separate words with underscores (_).

pid = '001'  # hard to understand unless you already know what it stands for
participant_id = '001'  # clearer, though it will take longer to type
ParticipantID = '001'  # not bad, but goes against standard Python style guidelines

There are different types of data that we can store in variables, including:

  • Strings (text)

  • Integer numbers

  • Floating-point numbers (decimals)

  • Booleans (True or False)

name = "John Doe"
age = 23
height = 5.9
study_complete = True

In Jupyter, we can display the value of a variable just by writing its name at the end of a cell. Outside of Jupyter, we need to use print. The output is a little different between these methods, but they both show the value of the variable.

name
'John Doe'
print(name)
John Doe

print is an example of a function. In Python (and other programming languages), functions take some sort of input and produce some sort of output. We can run them like so:

output1, output2, ... = function_name(input1, input2, ...)

print is a simple function that takes some variable and displays a text version of it. It’s often helpful when you need to see the value of a variable, and we’ll use it in examples.

There are many built-in functions in Python. For example, if I want to round a number, I can use the round function.

height_precise = 5.912
height_rounded = round(height_precise, 2)
height_rounded
5.91

F-strings#

Sometimes, it’s helpful to construct a more complicated string from individual variables. We can use a feature called f-strings to do this.

greeting = "Hello"
user = "Mary"
message = f"{greeting} {user}"
print(message)
Hello Mary

You make an f-string by adding an “f” before the quotes that make up a string. Variable names can be placed in curly braces ({}) to place their contents at that point in the string.

There are a lot of options for formatting variables as strings, making this method very flexible. We won’t get into this much now, but it’s helpful to know about. See the Python tutorial for more information.

name = "John Doe"
height = 5.9
age = 23
print(f"Patient {name}: {height=}, {age=}")
Patient John Doe: height=5.9, age=23

Data types#

Unlike some other languages, we don’t have to declare what type of data we’re using. After making a variable, we can check the data type using the type function.

weight_kg = 60.3
weight_kg_rounded = 60

# get type of variables
type_kg = type(weight_kg)
type_kg_rounded = type(weight_kg_rounded)
print(f"With a decimal ({weight_kg}), we get: {type_kg}")
With a decimal (60.3), we get: <class 'float'>
print(f"with no decimal ({weight_kg_rounded}), we get: {type_kg_rounded}")
with no decimal (60), we get: <class 'int'>

Exercise: variables and math#

Make variables to represent a participant’s identifier, sub-001, their accuracy on a test where they got 16 out of 24 items correct, and their age in years (23). Check the types of your variables, which should be str, float, and int. Round the accuracy score to two decimal places.

Advanced#

Use an f-string to print the participant’s data in this format: sub-001 (age=23): 67% correct

# answer here

Lists#

A common way to organize data is to place elements into a list. We can put any kind of data into a list.

participant_ids = ["001", "002", "003"]
mixed_data = ["John Doe", 5.9, 23]

To get the length of a list, use the len function.

len(participant_ids)
3

After creating a list, we can add elements to the end of it using append.

participant_ids.append("004")
print(participant_ids)
['001', '002', '003', '004']

We just used a special type of function called a method. Methods work like functions, but are attached to objects. Everything in Python is an object, and objects have attributes that are accessed using a . Some of these attributes, like append, are methods that we can call to work with an object.

Use the help function with any Python object to see what methods are associated with that object.

help(participant_ids)

Lists can be joined together using +.

l1 = [1, 2, 3]
l2 = [4, 5, 6]
print(l1 + l2)
[1, 2, 3, 4, 5, 6]

Note that the + is used a different way with lists than it is with numbers. This is called operator overloading. If you try to mix numbers and lists, you’ll get an error.

# invalid (try it!): 4 + [1, 2]

Indexing#

After we create a list, we can access elements of that list using indexing. Indexing lets us access data based on its position in the list.

A slightly confusing thing about Python (and many programming languages) is that the first index is 0, and it counts up from there. So, the first element is at 0, the second element is at 1, etc.

print(participant_ids)
['001', '002', '003', '004']
print(participant_ids[0])
001
print(participant_ids[1])
002

Indexing can be used to change elements of the list. We do this by assigning part of the list to something new.

conditions = ["1a", "1b", "2a", "2b"]
print(conditions)
['1a', '1b', '2a', '2b']
conditions[0] = "1c"
print(conditions)
['1c', '1b', '2a', '2b']

Exercise: making lists#

Make a list of memory scores for different participants: 8, 3, 14, 10. Using indexing to get the score for the third participant in the list.

Add one more participant’s score (15) to the list using the append method.

Change the second participant’s score to 9.

# answer here

Slicing#

Besides accessing one element at a time, we can also access a range of elements (called a slice) using the colon operator (:).

Slices usually just have a start index and a finish index, like this:

my_list[start:finish]

The finish is non-inclusive. For example, [0:2] will get the first two elements.

print(participant_ids[0:2])
['001', '002']

This picture summarizes how things work:

list:  [  1,  2,  3,  4,  5  ]
index: [  0   1   2   3   4  ]
slice: [0   1   2   3   4   5]

The index row shows the index to get individual elements of the list. For example, if we access index 2, we’ll get 3.

l = [1, 2, 3, 4, 5]
print(l[2])
3

The slice row shows how slicing works. You can think of the slice indices as being sort of between the elements of the list. When you slice a list, you’ll get back everything between the slice indices.

print(l[0:3])
print(l[3:5])
[1, 2, 3]
[4, 5]

If either the start or finish of a slice is omitted, the corresponding end of the list will be used.

print(participant_ids[:2])  # from the start until 2
['001', '002']
print(participant_ids[1:])  # from 1 until the end
['002', '003', '004']

The way indexing and slicing work might seem weird, but it has some nice properties. For example, if we have a list of 4 elements:

new_list = [1, 2, 4, 8]

We can split it in half in an intuitive way, where the index 2 marks the center of the list:

print(new_list[:2])  # first half
print(new_list[2:])  # second half
[1, 2]
[4, 8]

Negative indices#

In addition to the usual indices, which are non-negative integers (0, 1, 2, etc.), we can use negative indices instead, which count from the end of the list instead of the start.

list:  [   1,  2,  3,  4,  5   ]
index: [  -5  -4  -3  -2  -1   ]
slice: [-5  -4  -3  -2  -1   0 ]
l = [1, 2, 3, 4, 5]
print(l[-1])
print(l[-5:-3])
5
[1, 2]

Lists of lists#

Lists can hold any type of data. This means we can even make lists of lists.

participant_groups = [["001", "003", "005"], ["002", "004", "006"]]

We can then put together multiple indices. For example, to get the third entry in the second group:

print(participant_groups[1][2])
006

Slicing and indexing work the same way no matter what type of data we have in the list, so we can use it for lists of lists too.

print(participant_groups[0])
print(participant_groups[0][1:])
['001', '003', '005']
['003', '005']

Exercise: lists, indexing, and slicing#

Create a list of words (for a hypothetical memory experiment) in this order: hollow, pupil, fountain, piano.

Then using indexing and slicing to:

  • Get the second word in the list

  • Get the last item in the list using negative indices

  • Get the first three items in the list using slicing

  • Get the last two items using slicing with negative indices

Advanced#

Slicing can take another form with three numbers, to indicate the spacing to use. This form is written like start:finish:step. Use this method to get every other item in the list.

# answer here

Tuples#

Tuples are similar to lists, in that they store data by position, but once they are created they cannot be changed. Data structures that cannot be changed are called immutable.

Tuples are often used as inputs and outputs to functions. More on that later.

The syntax for creating tuples is similar to lists, but using parentheses instead of square braces.

my_tuple = (1, 2, 3)
other_tuple = ('a', 'b', 'c')
print(my_tuple)
print(other_tuple)
(1, 2, 3)
('a', 'b', 'c')

Tuples don’t actually require the parentheses; just the commas are sufficient to indicate a tuple.

new_tuple = 4, 5, 6

You can have a tuple with only one element, if you use a trailing comma.

one_item_tuple = 7,

Otherwise, this would just be a 7.

Tuples are indexed and sliced in the same way as lists.

numbers = (1, 2, 3)
print(numbers[1])
print(numbers[1:])
2
(2, 3)

Unlike lists, tuples cannot be changed once they are created.

Strings#

We saw before that we can store text data in strings. Strings are a lot like lists and tuples, because they are all considered iterables.

message = "hello world"
print(len(message))  # len works the same as for lists
print(message[:5])  # slicing can get used to get subsets of strings
11
hello

Like lists, strings can be concatenated.

word1 = "hello"
word2 = "world"
print(word1 + " " + word2)
hello world

Exercise: tuples and strings#

Make a strings by contenating two strings together. The first part of the string will be "sub-", and the second part will be "001". Place this string into a tuple.

Advanced#

After making your tuple, try adding another element to it. What happens?

# answer here

Dictionaries#

Dictionaries (or dicts, for short) contain mappings. Lists store data by position in the list, while dicts store data in terms of keys. This is often useful to label data with strings.

Dicts are specified using curly braces ({}), with key-value pairs separated by colons, like

{key1: value1, key2: value2}

For example, we could have a dict that stores data for each of a set of participants.

age = {"sub-001": 23, "sub-002": 29}
print(age)
{'sub-001': 23, 'sub-002': 29}

We can index data using the key to get the corresponding value.

print(age["sub-002"])
29

Dictionaries can contain any mapping between keys and values, as long as the keys are immutable. This means you can’t use things like lists as keys, because lists can be changed.

print({1: "A", 2: "B"})
print({1.1: "A.A", 2.2: "B.B"})
# not valid (try it!):  {[1, 2, 3]: ["A", "B", "C"]}
{1: 'A', 2: 'B'}
{1.1: 'A.A', 2.2: 'B.B'}

Values can be anything, including dicts, so you can make dicts of dicts.

holidays = {
    "new_years": {"month": "Jan", "day": 1},
    "mlk_day": {"month": "Jan", "day": 15},
}

Note that we can split dictionary statements over multiple lines for readibility.

After creating a dict, you can add more keys using an assignment statement.

age = {"sub-001": 23, "sub-002": 29}
age["sub-003"] = 25
print(age)
{'sub-001': 23, 'sub-002': 29, 'sub-003': 25}

Exercise: dictionaries#

Create a dict of participant memory scores. The participant identifiers were 01, 02, and 03, and they remembered 9, 13, and 10 words each, respectively.

After making this dict, get the memory score for participant 03.

Advanced#

Use a dictionary of dictionaries to store multiples pieces of information for multiple participants. Participant 01 was age 23 and had a memory score of 9. Participant 02 was age 29 and had a memory score of 13.

# answer here

Review#

We can assign simple variables to store numeric and text data.

height_m = 1.9
participant_id = "sub-002"

Lists and tuples can be used to store sequences of data points.

data = [1.9, "O+", 29]
sequence = (1, 2, 4, 8, 16)

Dicts can be used to store data in key, value pairs.

scores = {"sub-001": 9, "sub-002": 11, "sub-003": 8}