Python Pandas Data Frames Cheat Sheet
Note: the course focuses on using Polars, a package that was recently developed to take advantage of progress in technology and interface design, and which is more similar to other data science approaches such as SQL and R’s Tidyverse. This guide demonstrates how to do the same operations discussed in the chapter using the popular Pandas package instead of Polars. Note that it is generally simple to convert between Pandas and Polars data frames, so both packages can be used in a given workflow.
Importing data
Use Pandas functions to read data into a Pandas DataFrame. Pandas DataFrames use NumPy arrays to store data within each column. Unlike Polars, Pandas assigns an index to each row. Here, the index is similar to the indices of a Python list, including numbers from 0 to 3 for rows 1-4.
|
name |
birthdate |
weight |
height |
| 0 |
Alice Archer |
1997-01-10 |
57.9 |
1.56 |
| 1 |
Ben Brown |
1985-02-15 |
72.5 |
1.77 |
| 2 |
Chloe Cooper |
1983-03-22 |
53.6 |
1.65 |
| 3 |
Daniel Donovan |
1981-04-30 |
83.1 |
1.75 |
Data frame attributes
Use DataFrame attributes to get information about how the data are organized. This is similar to Polars, but with different ways to get the numbers of rows and columns.
Index(['name', 'birthdate', 'weight', 'height'], dtype='object')
name object
birthdate object
weight float64
height float64
dtype: object
(4, 4)
4
4
Creating a data frame directly
To make a DataFrame in your code, rather than inputting it from a file, use pd.DataFrame. This works the same as in Polars, but note that Pandas data frames are different from Polars data frames, though many functions can use them interchangeably.
To use pd.DataFrame, make a dictionary (use curly braces, {}) with a key for each column in the DataFrame. Each column will have a list of values, which will correspond to rows in the DataFrame.
|
participant_id |
age |
condition |
score1 |
score2 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
| 1 |
002 |
32 |
test |
6 |
2 |
| 2 |
003 |
65 |
restudy |
2 |
4 |
| 3 |
004 |
42 |
test |
6 |
5 |
Accessing data in a data frame
As in Polars, you can use indexing ([]) to access individual columns.
<class 'pandas.core.series.Series'>
0 3
1 6
2 2
3 6
Name: score1, dtype: int64
As in Polars, columns may be exported to NumPy arrays, allowing data to be analyzed using NumPy functions. But usually it’s more efficient to use DataFrame functions for analysis.
[3 6 2 6]
[8 2 4 5]
[-5 4 -2 1]
Expressions
Pandas currently does not support Polars-like expressions. However, starting in 3.0, Pandas will have similar syntax that was inspired by newer packages like Polars.
Using select and with_columns
Pandas does not have an equivalent of the select function used in Polars. Instead, a subset of columns can be selected and reordered using indexing.
|
participant_id |
age |
condition |
score1 |
score2 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
| 1 |
002 |
32 |
test |
6 |
2 |
| 2 |
003 |
65 |
restudy |
2 |
4 |
| 3 |
004 |
42 |
test |
6 |
5 |
Use square brackets ([]) with a list of columns to reorder them and/or get a subset of columns.
|
score1 |
score2 |
participant_id |
| 0 |
3 |
8 |
001 |
| 1 |
6 |
2 |
002 |
| 2 |
2 |
4 |
003 |
| 3 |
6 |
5 |
004 |
New columns can be added either using assignment or using the assign function.
|
score1 |
score2 |
score_total |
| 0 |
3 |
8 |
11 |
| 1 |
6 |
2 |
8 |
| 2 |
2 |
4 |
6 |
| 3 |
6 |
5 |
11 |
|
score1 |
score2 |
score_total |
| 0 |
3 |
8 |
11 |
| 1 |
6 |
2 |
8 |
| 2 |
2 |
4 |
6 |
| 3 |
6 |
5 |
11 |
Using filter
Pandas does not have a function that works the same as Polar’s filter. Rows may be filtered using logical indexing (similar to filtering NumPy arrays) or using a query string with query.
|
participant_id |
age |
condition |
score1 |
score2 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
| 1 |
002 |
32 |
test |
6 |
2 |
| 2 |
003 |
65 |
restudy |
2 |
4 |
| 3 |
004 |
42 |
test |
6 |
5 |
|
participant_id |
age |
condition |
score1 |
score2 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
| 1 |
002 |
32 |
test |
6 |
2 |
| 3 |
004 |
42 |
test |
6 |
5 |
|
participant_id |
age |
condition |
score1 |
score2 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
| 1 |
002 |
32 |
test |
6 |
2 |
| 3 |
004 |
42 |
test |
6 |
5 |
|
participant_id |
age |
condition |
score1 |
score2 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
| 3 |
004 |
42 |
test |
6 |
5 |
|
participant_id |
age |
condition |
score1 |
score2 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
| 3 |
004 |
42 |
test |
6 |
5 |
|
participant_id |
age |
condition |
score1 |
score2 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
|
participant_id |
age |
condition |
score1 |
score2 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
|
participant_id |
age |
condition |
score1 |
score2 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
| 1 |
002 |
32 |
test |
6 |
2 |
| 3 |
004 |
42 |
test |
6 |
5 |
|
participant_id |
age |
condition |
score1 |
score2 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
| 1 |
002 |
32 |
test |
6 |
2 |
| 3 |
004 |
42 |
test |
6 |
5 |
Comparison expressions
Pandas currently does not support Polars-like expressions, but query can be used to filter rows in a similar manner using query strings such as the ones shown below.
Sorting data
Use sort_values to rearrange the order of the rows in a data frame. It works similarly to Polar’s sort, but changing the order of sorting involves setting the ascending input instead of Polar’s descending input.
|
participant_id |
age |
condition |
score1 |
score2 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
| 1 |
002 |
32 |
test |
6 |
2 |
| 2 |
003 |
65 |
restudy |
2 |
4 |
| 3 |
004 |
42 |
test |
6 |
5 |
|
participant_id |
age |
condition |
score1 |
score2 |
| 2 |
003 |
65 |
restudy |
2 |
4 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
| 1 |
002 |
32 |
test |
6 |
2 |
| 3 |
004 |
42 |
test |
6 |
5 |
|
participant_id |
age |
condition |
score1 |
score2 |
| 1 |
002 |
32 |
test |
6 |
2 |
| 3 |
004 |
42 |
test |
6 |
5 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
| 2 |
003 |
65 |
restudy |
2 |
4 |
|
participant_id |
age |
condition |
score1 |
score2 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
| 2 |
003 |
65 |
restudy |
2 |
4 |
| 1 |
002 |
32 |
test |
6 |
2 |
| 3 |
004 |
42 |
test |
6 |
5 |
|
participant_id |
age |
condition |
score1 |
score2 |
| 0 |
001 |
25 |
restudy |
3 |
8 |
| 2 |
003 |
65 |
restudy |
2 |
4 |
| 3 |
004 |
42 |
test |
6 |
5 |
| 1 |
002 |
32 |
test |
6 |
2 |