Utils
NpEncoder
Bases: JSONEncoder
JSON encoder to manage numpy objects
Source code in template_nlp/utils.py
data_agnostic_str_to_list(function)
Decorator to transform a string into a list of one element,
and retrieve first element of the function returns.
Idea: be able to do predict(my_string)
Otherwise, we would have to do prediction = predict([my_string])[0]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
function |
func
|
Function to decorate |
required |
Returns: function: The decorated function
Source code in template_nlp/utils.py
display_shape(df)
Displays the number of line and of column of a table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
Table to parse |
required |
Source code in template_nlp/utils.py
find_folder_path(folder_name, base_folder=None)
Find a folder in a base folder and its subfolders. If base_folder is None, considers folder_name as a path and check it exists
i.e., with the following structure : - C:/ - base_folder/ - folderA/ - folderB/ - folderC/ find_folder_path(folderA, C:/base_folder) == C:/base_folder/folderA find_folder_path(folderB, C:/base_folder) == C:/base_folder/folderA/folderB find_folder_path(C:/base_folder/folderC, None) == C:/base_folder/folderC find_folder_path(folderB, None) raises an error
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder_name |
str
|
name of the folder to find. If base_folder is None, consider a path instead. |
required |
Kwargs: base_folder (str): path of the base folder. If None, consider folder_name as a path. Raises: FileNotFoundError: If we can't find folder_name in base_folder FileNotFoundError: If folder_name is not a valid path (case where base_folder is None) Returns: str: path to the wanted folder
Source code in template_nlp/utils.py
get_chunk_limits(x, chunksize=10000)
Gets chunk limits from a pandas series or dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Series or DataFrame
|
Documents to consider |
required |
Kwargs:
chunksize (int): The chunk size
Raises:
ValueError: If the chunk size is negative
Returns:
list
Source code in template_nlp/utils.py
get_data_path()
Returns the path to the data folder
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Path of the data folder |
Source code in template_nlp/utils.py
get_models_path()
Returns the path to the models folder
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Path of the models folder |
Source code in template_nlp/utils.py
get_new_column_name(column_list, wanted_name)
Gets a new column name from a list of existing ones & a wanted name
If the wanted name does not exists, return it. Otherwise get a new column prefixed by the wanted name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_list |
list
|
List of existing columns |
required |
wanted_name |
str
|
Wanted name |
required |
Source code in template_nlp/utils.py
get_package_version()
Returns the current version of the package
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
version of the package |
get_ressources_path()
Returns the path to the ressources folder
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Path of the ressources folder |
Source code in template_nlp/utils.py
get_transformers_path()
Returns the path to the transformers folder
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Path of the transformers folder |
Source code in template_nlp/utils.py
is_ndarray_convertable(obj)
Returns True if the object is covertable to a builtin type in the same way a np.ndarray is
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj |
Any
|
an object to test |
required |
Returns: bool: True if the object is covertable to a list as a np.ndarray is
Source code in template_nlp/utils.py
ndarray_to_builtin_object(obj)
Transform a numpy.ndarray like object to a builtin type like int, float or list
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj |
Any
|
An object |
required |
Raises: ValueError: Raise a ValueError when obj is not ndarray convertable Returns: Any: The object converted to a builtin type like int, float or list
Source code in template_nlp/utils.py
read_csv(file_path, sep=';', encoding='utf-8', dtype=str, **kwargs)
Reads a .csv file and parses the first line.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path |
str
|
Path to the .csv file containing the data |
required |
Kwargs: sep (str): Separator of the data file encoding (str): Encoding of the data file kwargs: Pandas' kwargs Raises: FileNotFoundError: If the file_path object does not point to an existing file Returns: pd.DataFrame: Data str: First line of the .csv (None if not beginning with #) and with no line break
Source code in template_nlp/utils.py
to_csv(df, file_path, first_line=None, sep=';', encoding='utf-8', **kwargs)
Writes a .csv and manages the first line.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
Data to write |
required |
file_path |
str
|
Path to the file to create |
required |
Kwargs: first_line (str): First line to write (without line break which is done in this function) sep (str): Separator for the data file encoding (str): Encoding of the data file kwargs: pandas' kwargs
Source code in template_nlp/utils.py
trained_needed(function)
Decorator to ensure that a model has been trained.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
function |
func
|
Function to decorate |
required |
Returns: function: The decorated function