Getting started
Requirements
- Read: Cookiecutter Data Science Docs for a general overview
- Python 3.12+
- uv
- Cookiecutter Python package >= 2.1.1
Creating new project
Run:
cookiecutter https://github.com/xultaeculcis/ml-project-cookiecutter
You will be prompted to provide project info one argument at a time:
project_name (project_name): My ML project
repo_name (my-ml-project):
package_name (my_ml_project):
author_name (Your name (or your organization/company/team)): xultaeculcsis
repo_url (https://github.com/xultaeculcsis/my-ml-project):
project_description (A short description of the project): Just an ML project :)
Select python_version
1 - 3.11
2 - 3.12
3 - 3.13
Choose from [1/2/3] (1): 1
sanitized_python_version (311):
Select license:
1 - MIT
2 - Apache 2.0
3 - BSD-3-Clause
4 - Beerware
5 - GLWTS
6 - Proprietary
7 - Empty license file
Choose from [1/2/3/4/5/6/7] (1): 1
The repo_name
, package_name
, repo_url
and sanitized_python_version
will be automatically standardized
and provided for you. You can change them to your liking though.
Working with the project
Project directory structure
The resulting project structure will look like this:
my-ml-project/
├── data
│ ├── analysis <- EDA artifacts.
│ ├── auxiliary <- The auxiliary, third party data.
│ ├── inference <- Inference results from your models.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── Dockerfile <- Dockerfile definition.
├── docs <- The mkdocs documentation sources.
│ ├── api_ref <- Source package docs.
│ │ ├── consts.md
│ │ ├── core
│ │ │ ├── configs.md
│ │ │ └── settings.md
│ │ └── utils.md
│ ├── guides <- How-to guides.
│ │ ├── contributing.md
│ │ ├── makefile-usage.md
│ │ ├── setup-dev-env.md
│ │ └── tests.md
│ ├── index.md <- Docs homepage.
│ └── __init__.py
├── env-dev.yaml <- Conda environment definition with development dependencies.
├── env.yaml <- Main Conda environment definition with only the necessary packages.
├── LICENSE <- The license file.
├── Makefile <- Makefile with commands like `make docs` or
│ `make pc`.
├── mkdocs.yml
├── src
│ └── my_ml_project <- Project source code.
│ │ This will be different depending on your input during project creation.
│ ├── consts <- Constants to be used across the project.
│ │ ├── __init__.py
│ │ ├── directories.py
│ │ ├── logging.py
│ │ └── reproducibility.py
│ ├── core <- Core project stuff. E.g., the base classes
│ │ │ for step entrypoint configs.
│ │ ├── configs
│ │ │ └── __init__.py
│ │ │ ├── argument_parsing.py
│ │ │ ├── base.py
│ │ ├── __init__.py
│ │ └── settings.py
│ ├── __init__.py
│ ├── py.typed
│ └── utils <- Utility functions and classes.
│ ├── __init__.py
│ ├── gpu.py
│ ├── logging.py
│ ├── mlflow.py
│ └── serialization.py
├── notebooks <- Jupyter notebooks. Naming convention is a
│ number (for ordering), the creator's initials,
│ and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
├── pyproject.toml <- Contains build system requirements
│ and information, which are used by pip to build
│ the package and project tooling configs.
├── README.md
├── setup.py
└── tests <- The tests directory.
├── conftest.py <- Contains test fixtures and utility functions.
├── e2e <- Contains end-to-end tests.
├── __init__.py
├── integration <- Contains integration tests.
└── unit <- Contains unit tests.
Most of those folders were described in detail in the Cookiecutter Data Science Docs.
Environment setup
The post-generation hook will initialize git
repo for you. It will also set up main
to track the remote branch.
Initial commit will be created with all repo files added to it. You can modify them and amend the commit at your
convenience.
Via Makefile
Right after creating new project from the cookiecutter template you'll need to freeze the dependencies. Initial
pyproject.toml
has a minimal set of dependencies needed for the helper functions, test execution and docs creation.
Note that most of the dependencies are not pinned in the pyproject.toml
. This is done on purpose in order to ensure
that new projects can be created with the most up-to-date packages. Once you create the lock file, you can pin specific
versions.
To lock the environment run:
make env
This command will set up the environment for you. It will also install pre-commit
hooks and the project in an editable
mode.
Manually
If you are not on Linux the setup via Makefile
might not work. In that case run the following commands manually.
To set up your local env from scratch run:
-
Create
uv.lock
file:uv sync --all-groups
-
Install
pre-commit
hooks:uv run pre-commit install
Note
Once you've initialized git repo, created the lock file(s) and pinned the package versions, you should commit the
changes and push them to a remote repository as an Initial commit
.
Pre-commit hooks
This project uses pre-commit
package for managing and maintaining pre-commit
hooks.
To ensure code quality - please make sure that you have it configured.
-
Install
pre-commit
and following packages:ruff
,mypy
,pytest
. -
Install
pre-commit
hooks by running:uv run pre-commit install
-
The command above will automatically run formatters, code checks and other steps defined in the
.pre-commit-config.yaml
-
All of those checks will also be run whenever a new commit is being created i.e. when you run
git commit -m "blah"
-
You can also run it manually with this command:
uv run pre-commit run --all-files
or withmake pc
You can manually disable pre-commit
hooks by running: pre-commit uninstall Use this only in exceptional cases.
Environment variables
Ask your colleagues for .env
files which aren't included in this repository and put them inside the repo's
root directory. Please, never put secrets in the source control. Always align with your IT department security
practices.
To see what variables you need see the .env-sample
file.
CI pipelines
Currently, the project supports only Azure DevOps Pipelines.
By default, the project comes with a single CI pipeline that runs a set of simplified pre-commit hooks on each PR
commit that targets the main
branch.
Documentation
We use MkDocs with Material theme.
To build the docs run:
make docs
If you want to verify the docs locally use:
uv run mkdocs serve
A page like the one below should be available to you under: http://127.0.0.1:8000/
Note
Please note that google style docstrings are used throughout the repo.