Skip to content

NUM Framework

Project structure

Here is the structure of a project generated with generate_num_project command :

.
├─ template_num                       # your application package
    ├─ models_training               # global config and utilities
        ├─ classifiers
            ├─ models_sklearn      # package containing some predefined scikit-learn classifiers
            └─ models_tensorflow   # package containing some predefined tensorflow classifiers
        ├─ regressors      
            ├─ models_sklearn      # package containing some predefined scikit-learn regressors
            └─ models_tensorflow   # package containing some predefined tensorflow regressors
        ├─ ...
        ├─ model_class.py           # module containing base Model class
        └─ utils_models.py          # module containing utility functions
        ├─ monitoring                    # package containing monitoring utilities (mlflow, model explicability)
        ├─ preprocessing                 # package containing preprocessing logic
        ├─ __init__.py
    └─ utils.py
├─ template_num-data                  # Folder where to store your data
├─ template_num-exploration           # Folder where to store your exploratory notebooks
├─ template_num-models                # Folder containing trained models
├─ template_num-pipelines             # Folder containing fitted pipelines are stored
├─ template_num-scripts               # Folder containing script for preprocessing, training, etc.
├─ template_num-tutorials             # Folder containing a tutorial notebook
.
.
.
├─ makefile
├─ setup.py
└─ README.md

Numeric framewrok specificities

  • Preprocessing has to be computed in a two step fashion to avoid bias:

  • Fit your transformations on the training data (1_preprocess_data.py)

  • Transform your validation/test sets (2_apply_existing_pipeline.py)

  • Preprocessing pipelines are stored in the project_name-pipelines folder

  • They are then stored as a .pkl object in the model folders (so that these can be used during inference)

Warning

If you used a custom preprocessing function funcA with FunctionTransformer, be aware that the pickled pipeline may not return wanted results if you later modify funcA definition.

Please check gabarit/issues/63