6.12. Python Data Analysis Cheat Sheet#
Saving and loading data#
Use np.savetxt and np.save to save data from a NumPy array to a file. Use np.savez to save multiple arrays to one file. Use np.loadtxt and np.load to load arrays from files.
import numpy as np
trial_type = np.array(["target", "lure", "lure", "target", "lure", "target", "target", "lure"])
response = np.array(["old", "old", "new", "old", "new", "new", "old", "new"])
response_time = np.array([5.4, 3.4, 8.4, 3.2, 3.9, 5.2, 6.1, 7.1])
Use np.savetxt to save an array to a human-readable text file.
np.savetxt("data/response_time.txt", response_time)
Use np.save to save an array to a NumPy-format file that is smaller and faster to read and write.
np.save("data/response_time.npy", response_time)
Use np.savez to save multiple arrays to one NumPy-format file.
np.savez(
"data/trials.npz",
trial_type=trial_type,
response=response,
response_time=response_time,
)
Use np.loadtxt to load an array from a text file (run help(np.loadtxt) to see options for reading from files with different formatting).
rt1 = np.loadtxt("data/response_time.txt")
Use np.load to load data from a NumPy-formatted file (.npy or .npz).
rt2 = np.load("data/response_time.npy") # load one array from an npy file
trial_data = np.load("data/trials.npz") # load multiple arrays from an npz file
print(list(trial_data.keys())) # display arrays in a loaded npz file
trial_type = trial_data["trial_type"] # access a variable from a loaded npz file
['trial_type', 'response', 'response_time']
Analyzing data in different files#
When data are stored in separate files, we can use a for loop to iterate over them and run analysis code for each file.
subjects = ["01", "02", "03", "04", "05", "06", "07", "08"] # subjects to analyze
hr = [] # list that will hold results
for subject in subjects:
file = f"data/sub-{subject}_beh.npz" # data file for this subject
data = np.load(file, allow_pickle=True) # use allow_pickle=True to load string data
trial_type = data["trial_type"] # access variable from the loaded data
response = data["response"]
subject_hr = np.mean(response[trial_type == "target"])
hr.append(subject_hr) # add result to the list
The for loop produces a list of results, with one hit rate for each subject. We can then analyze the subject hit rates further.
mean_hr = np.mean(hr)
std_hr = np.std(hr)
print(f"Hit rate: mean={mean_hr:.2f}, sd={std_hr:.2f}")
Hit rate: mean=0.59, sd=0.06