NUM Framework

Project structure

Here is the structure of a project generated with generate_num_project command :

.
├─ template_num                       # your application package
│    ├─ models_training               # global config and utilities
│    │    ├─ classifiers
│    │    │    ├─ models_sklearn      # package containing some predefined scikit-learn classifiers
│    │    │    └─ models_tensorflow   # package containing some predefined tensorflow classifiers
│    │    ├─ regressors      
│    │    │    ├─ models_sklearn      # package containing some predefined scikit-learn regressors
│    │    │    └─ models_tensorflow   # package containing some predefined tensorflow regressors
│    │    ├─ ...
│    │    ├─ model_class.py           # module containing base Model class
│    │    └─ utils_models.py          # module containing utility functions
│    │
│    ├─ monitoring                    # package containing monitoring utilities (mlflow, model explicability)
│    │
│    ├─ preprocessing                 # package containing preprocessing logic
│    │
│    ├─ __init__.py
│    └─ utils.py
│
├─ template_num-data                  # Folder where to store your data
├─ template_num-exploration           # Folder where to store your exploratory notebooks
├─ template_num-models                # Folder containing trained models
├─ template_num-pipelines             # Folder containing fitted pipelines are stored
├─ template_num-scripts               # Folder containing script for preprocessing, training, etc.
├─ template_num-tutorials             # Folder containing a tutorial notebook
.
.
.
├─ makefile
├─ setup.py
└─ README.md

Numeric framewrok specificities

Preprocessing has to be computed in a two step fashion to avoid bias:
Fit your transformations on the training data (1_preprocess_data.py)
Transform your validation/test sets (2_apply_existing_pipeline.py)
Preprocessing pipelines are stored in the project_name-pipelines folder
They are then stored as a .pkl object in the model folders (so that these can be used during inference)

Warning

If you used a custom preprocessing function funcA with FunctionTransformer, be aware that the pickled pipeline may not return wanted results if you later modify funcA definition.

Please check gabarit/issues/63