Preprocess
get_ct_feature_names(ct)
Gets the names of the columns when considering a fitted ColumnTransfomer From: https://stackoverflow.com/questions/57528350/can-you-consistently-keep-track-of-column-labels-using-sklearns-transformer-api
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ColumnTransformer |
Column tranformer to be processed |
required |
Returns: list: List of new feature names
Source code in template_num/preprocessing/preprocess.py
get_feature_out(estimator, features_in)
Gets the name of a column when considering a fitted estimator
Parameters:
Name | Type | Description | Default |
---|---|---|---|
(?) |
Estimator to be processed |
required | |
(list) |
Input columns |
required |
Returns: list: List of new feature names
Source code in template_num/preprocessing/preprocess.py
get_pipeline(pipeline_str)
Gets a pipeline from its name
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline_str |
str
|
Name of the pipeline |
required |
Raises: ValueError: If the name of the pipeline is not known Returns: ColumnTransfomer: Pipeline to be used for the preprocessing
Source code in template_num/preprocessing/preprocess.py
get_pipelines_dict()
Gets a dictionary of available preprocessing pipelines
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
Dictionary of preprocessing pipelines |
Source code in template_num/preprocessing/preprocess.py
preprocess_P1()
Gets "default" preprocessing pipeline
Returns:
Name | Type | Description |
---|---|---|
ColumnTransformer |
ColumnTransformer
|
The pipeline |
Source code in template_num/preprocessing/preprocess.py
preprocess_auto()
Gets an "automatic" pipeline. Different functions are applied depending on stats calculated on the data
Returns:
Name | Type | Description |
---|---|---|
ColumnTransformer |
ColumnTransformer
|
The automatic pipeline |
Source code in template_num/preprocessing/preprocess.py
retrieve_columns_from_pipeline(df, pipeline)
Retrieves columns name after preprocessing
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
Dataframe after preprocessing (without target) |
required |
pipeline |
ColumnTransformer
|
Used pipeline |
required |
Raises: AttributeError : The pipeline is not fitted ValueError : The number of columns is not the same between the pipeline and the preprocessed DataFrame Returns: pd.DataFrame: Dataframe with columns' name