Python Data Analysis Cheat Sheet

6.12. Python Data Analysis Cheat Sheet#

Saving and loading data#

Use np.savetxt and np.save to save data from a NumPy array to a file. Use np.savez to save multiple arrays to one file. Use np.loadtxt and np.load to load arrays from files.

import numpy as np
trial_type = np.array(["target", "lure", "lure", "target", "lure", "target", "target", "lure"])
response = np.array(["old", "old", "new", "old", "new", "new", "old", "new"])
response_time = np.array([5.4, 3.4, 8.4, 3.2, 3.9, 5.2, 6.1, 7.1])

Use np.savetxt to save an array to a human-readable text file.

np.savetxt("data/response_time.txt", response_time)

Use np.save to save an array to a NumPy-format file that is smaller and faster to read and write.

np.save("data/response_time.npy", response_time)

Use np.savez to save multiple arrays to one NumPy-format file.

np.savez(
    "data/trials.npz", 
    trial_type=trial_type, 
    response=response,
    response_time=response_time,
)

Use np.loadtxt to load an array from a text file (run help(np.loadtxt) to see options for reading from files with different formatting).

rt1 = np.loadtxt("data/response_time.txt")

Use np.load to load data from a NumPy-formatted file (.npy or .npz).

rt2 = np.load("data/response_time.npy")  # load one array from an npy file
trial_data = np.load("data/trials.npz")  # load multiple arrays from an npz file
print(list(trial_data.keys()))           # display arrays in a loaded npz file
trial_type = trial_data["trial_type"]    # access a variable from a loaded npz file
['trial_type', 'response', 'response_time']

Analyzing data in different files#

When data are stored in separate files, we can use a for loop to iterate over them and run analysis code for each file.

subjects = ["01", "02", "03", "04", "05", "06", "07", "08"]  # subjects to analyze
hr = []  # list that will hold results
for subject in subjects:
    file = f"data/sub-{subject}_beh.npz"     # data file for this subject
    data = np.load(file, allow_pickle=True)  # use allow_pickle=True to load string data
    trial_type = data["trial_type"]          # access variable from the loaded data
    response = data["response"]
    subject_hr = np.mean(response[trial_type == "target"])
    hr.append(subject_hr)                    # add result to the list

The for loop produces a list of results, with one hit rate for each subject. We can then analyze the subject hit rates further.

mean_hr = np.mean(hr)
std_hr = np.std(hr)
print(f"Hit rate: mean={mean_hr:.2f}, sd={std_hr:.2f}")
Hit rate: mean=0.59, sd=0.06