12.10. Python Software Packages Cheat Sheet#
Python modules#
Python modules are files that contain Python code, usually mostly containing functions that are designed to carry out common operations.
import analysis # import a Python file on your path
help(analysis) # view information about the module
Help on module analysis:
NAME
analysis - Sample module with some sample functions for data analysis.
FUNCTIONS
dprime(trial_type, response)
Calculate d-prime for recognition memory task responses.
Args:
trial_type:
An iterable with strings, indicating whether each trial is a "target"
or "lure".
response:
An iterable with strings, indicating whether the response on each trial
was "old" or "new".
Returns:
The d-prime measure of recognition accuracy.
exclude_fast_responses(response_times, threshold)
Exclude response times that are too fast.
Args:
response_times:
An iterable with response times.
threshold:
Threshold for marking response times. Response times less than or equal
to the threshold will be marked False.
Returns:
filtered:
An array with only the included response times.
is_included:
A boolean array that is False for responses less than the threshold,
and True otherwise.
FILE
/home/runner/work/datascipsych/datascipsych/book/chapters/chapter12/analysis.py
Functions in Python modules can be called by accessing them through the module.
trial_type = ["target", "lure", "lure", "target", "target", "target"]
response = ["old", "old", "new", "new", "old", "old"]
analysis.dprime(trial_type, response)
np.float64(0.967421566101701)
After a module is imported, if there is a change to the module file, it must be reloaded before the new code can be executed.
import importlib
importlib.reload(analysis)
<module 'analysis' from '/home/runner/work/datascipsych/datascipsych/book/chapters/chapter12/analysis.py'>
The search path#
For a module to be imported successfully, it must be on your search path, which is a list of directories that Python uses to look for code.
import sys
sys.path # get a list of directories on your system that will be searched
['/opt/hostedtoolcache/Python/3.13.6/x64/lib/python313.zip',
'/opt/hostedtoolcache/Python/3.13.6/x64/lib/python3.13',
'/opt/hostedtoolcache/Python/3.13.6/x64/lib/python3.13/lib-dynload',
'',
'/home/runner/work/datascipsych/datascipsych/.venv/lib/python3.13/site-packages',
'/home/runner/work/datascipsych/datascipsych/src']
Packages that have been installed using a command like uv sync should be in one of the listed directories.
An empty string indicates the currect directory. It allows modules in your current directory to be imported.
Python packages#
Python packages include code, documentation, and files that specify how the code should be installed.
Packages can be initialized using uv init.
uv init # initialize a project in the current directory
uv init myprojectdir # initialize in a different directory
uv init --lib # set up a Python library
Using the --lib flag sets up a package that can be installed so that any modules will be importable in Python code.
A standard Python package that has been initialized as a library using uv init --lib and has two modules, mymodule1.py and mymodule2.py, looks like this:
myproject # main project directory
├── .gitignore # lists files that should not be tracked by Git
├── .python-version # Python version that should be used to run code
├── pyproject.toml # standardized information about the project
├── README.md # documentation for the project
├── uv.lock # the set of dependency versions to use
└── src # main directory for source code
└── myproject # Python package with one or more modules
├── __init__.py # code to initialize the package
├── mymodule1.py # module with Python code
├── mymodule2.py # another module
└── py.typed # file that indicates this is a code library
Packages that have been initialized using uv can be installed on any computer using uv sync. That will install the specified version of Python, then create a virtual environment and install any necessary dependencies. It will also install the code in src/myproject and make it available to be imported. For example, after installation, you would be able to run from myproject import mymodule1 to access the functions defined in mymodule1.py.
Package configuration#
Packages are configured using information in the pyproject.toml file, which follows a standard format.
Metadata listed at the top of the file gives basic information about the project.
[project]
name = "datascipsych"
version = "0.1.0"
description = "Code for the Data Science for Psychology course."
authors = [
{name = "Neal W Morton", email = "mortonne@gmail.com"}
]
The version code should generally follow Semantic Versioning. In a version string “X.Y.Z”, “X” indicates the major version, “Y” is the minor version, and “Z” is the patch version.
Packages that are not yet stable have versions that start with 0. For a new package, start with 0.1.0. If you fix a bug or set of bugs, increment the patch version, like 0.1.1. If you add major new features or make a backward-incompatible change, increment the minor version, like 0.2.0.
If you have a well-tested package, go to version 1.0.0. At that point, the major version should not change unless you have to make a backward-incompatible change. When adding features, change the minor version. When fixing bugs, change the patch version.
The [project] table lists dependencies, which are Python packages that the project uses.
dependencies = [
"numpy",
"polars",
"seaborn"
]
To add dependencies to a project, use uv add [package1] [package2]. For example, to add Polars as a dependency, run uv add polars. This will change the dependencies list in pyproject.toml and add information to uv.lock.
Version specifiers can be used to indicate constraints on the package versions that should be installed. Multiple specifiers can be used if they are separated by commas.
uv add 'polars == 1.35.2' # ==: use this exact version only
uv add 'polars ~= 1.35.2' # ~=: use a compatible version
uv add 'polars ~= 1.35.2, != 1.35.0' # !=: do not use this version
uv add 'polars >= 1.35.2' # >=: use this version or later
Version constraints can be important to ensure that you have necessary functionality (which may not exist in older versions of a package) or avoid critical bugs (which may exist only in specific versions). If a dependency is critical for your package, consider using the ~= specifier to avoid installing changes that might break compatibility with your package.
Version Tracking#
Version tracking programs such as Git are used to keep track of changes to code.
Repository components#
Git repositories have multiple parts.
workspace: files in the directory being tracked.
index or staging area: changes that will be incorporated into the next commit.
local repository: the history of all commits that is stored on your computer.
remote repository: a history of commits stored on a website such as GitHub.
Actions for working with repositories#
Working with Git repositories involves various actions, which can be completed using an IDE like Visual Studio Code or using commands in the terminal.
clone: get a complete copy of project history from a remote repository.
add: add a new file or changes to an existing file to the list of staged modifications that will be included in the next commit.
commit: take the current list of staged modifications and store a snapshot with those changes, along with a message describing the changes, in the local repository.
push: send the history of changes in the local repository to a remote repository hosted on a service such as GitHub.
pull: get the history of changes from a remote repository and apply them to your local repository.
Handling conflicts between versions#
Sometimes, local changes can get in the way of pulling changes from a remote repository. These actions provide options for how to handle local changes that conflict with remote changes.
revert: take some modification that has not yet been committed, such as a new file or edits to an existing file, and discard it.
stash: take some modifications that have not yet been committed and stash them away so they will not interfere with getting changes from a remote repository.
merge: figure out a compromise between local and remote changes, using various tools, which are often built into IDEs, to help select which different versions of file content should be used.
Creating a Git repository#
To track code, first create a Git repository.
If you initialized your project using uv init, you should already have a Git repository initialized. This will include a .gitignore file, which indicates files that should not be tracked by Git. If you have files that are in the directory but that you do not want to be in the repository (for example, configuration files generated by your IDE or temporary files created by your operating system), add a line to the .gitignore file.
For example, these lines could be used to ignore various types of files that are not directly related to your code and should not be included in the repository:
.venv # virtual environment directory
.DS_Store # macOS file with folder information
.idea # configuration directory used by JetBrains IDE programs
The .gitignore file generated by uv automatically excludes some common temporary Python files.
Making a commit#
After making changes to code or documentation, make a commit to store a snapshot of the new version of your package in Git.
Before making commits with Git for the first time on a computer, define the name and email address to associate with your commits. In the terminal, run these commands:
git config --global user.name "[name]"
git config --global user.email "[email address]"
Replace [name] and [email address] with the name and email address you want to associate with commits.
To make a commit, first use your IDE to add files and edits to files to the staging area of Git. Try to group together changes that are related and can be easily described in a short commit message.
First, select a tag to describe your changes, following the Conventional Commits standard.
feat: New feature
fix: A fix of some bug
refactor: A change to how code is organized that does not change functionality
style: A change to coding style
build:
Something related to the project or installation of the project (e.g., changes to pyproject.toml)
docs: A change to project documentation
Next, write a message in the format [tag]: [message], where message is a declarative statement about what the change does. For example:
feat: make plot of recall accuracy
fix: bug when loading data
refactor: use a function to analyze subject data
style: automatically format code
build: add polars as a dependency
docs: add instructions on running analyses
Try to make your message short but give a rough idea of what changes you made, to make it easy to read through your changes later on. This will make it easier to do things like find a change that introduced a bug and fix it.