python.dgbpy.mlapply
¶
Module Contents¶
Functions¶
|
Computes scaler |
|
|
|
|
|
Gets scaled training data |
|
Parameters: |
|
Gets scaled training data |
|
Gets scaler object for data scaling |
|
Gets new scaler object |
|
|
|
Method to perform a training job using any platform and for any workflow |
|
For reformatting prediction result type(s) |
|
|
|
Applies a trained machine learning model on any platform for any workflow |
|
|
|
Gets count of input images (train and validation) |
|
|
|
Gets count of input images |
|
Attributes¶
- python.dgbpy.mlapply.TrainType¶
- python.dgbpy.mlapply.computeScaler_(datasets, infos, scalebyattrib)¶
Computes scaler
- Parameters:
datasets (dict): dataset
infos (dict): information about example file
scalebyattrib (bool):
- python.dgbpy.mlapply.computeChunkedScaler_(datasets, infos, groupnm, scalebyattrib)¶
- python.dgbpy.mlapply.computeScaler(infos, scalebyattrib, force=False)¶
- python.dgbpy.mlapply.getScaledTrainingData(filenm, flatten=False, scale=True, force=False, nbchunks=1, split=None)¶
Gets scaled training data
- Parameters:
filenm (str): path to file
flatten (bool):
scale (bool or iter):
nbchunks (int): number of data chunks to be created
split (float): size of validation data (between 0-1)
- python.dgbpy.mlapply.getInputList(datasets)¶
- Parameters:
datasets (dict): dataset from example file
- Returns:
dict:
- python.dgbpy.mlapply.getScaledTrainingDataByInfo(infos, flatten=False, scale=True, ichunk=0)¶
Gets scaled training data
- Parameters:
infos (dict): information about example file
flatten (bool):
scale (bool): defaults to True, a scaling object is applied to returned data, otherwise if False is specified
ichunk (int): number of data chunks to be created
- Returns:
dict: of training data with x_train, y_train, x_validation, y_validation, infos as keys.
- python.dgbpy.mlapply.getScaler(x_train, byattrib=True)¶
Gets scaler object for data scaling
- Parameters:
x_train (array): data to be scaled
- byattrib (bool): True if scaling should be done by individual attribute
present in data, False if otherwise
- Returns:
object: StandardScaler object fitted on data (from sklearn.preprocessing)
- python.dgbpy.mlapply.getNewScaler(mean, scale)¶
Gets new scaler object
- Parameters:
mean (ndarray of shape (n_features,) or None): mean value to be used for scaling
scale ndarray of shape (n_features,) or None: Per feature relative scaling of the data to achieve zero mean and unit variance (fromm sklearn docs)
- Returns:
object: scaler (an instance of sklearn.preprocessing..StandardScaler())
- python.dgbpy.mlapply.transform(x_train, scaler)¶
- python.dgbpy.mlapply.doTrain(examplefilenm, platform=dgbkeys.kerasplfnm, type=TrainType.New, params=None, outnm=dgbkeys.modelnm, logdir=None, clearlogs=False, modelin=None, args=None)¶
- Method to perform a training job using any platform and for any workflow
(trained model is also saved)
- Parameters:
examplefilenm (str): file name/path to example file in hdf5 format
platform (str): machine learning platform choice (options are; keras, scikit-learn, torch)
type (str): type of training; new or transfer, or continue (Resume)
params (dict): machine learning hyperparameters or parameters options
outnm (str): name to save trained model as
- logdir (str): the path of the directory where to save the log
files to be parsed by TensorBoard (only applicable for the keras platform)
clearlogs (bool): clears previous logs if any when set to True
modelin (str): model file path/name in hdf5 format
args (dict, optional): Dictionary with the members ‘dtectdata’ and ‘survey’ as single element lists, and/or ‘dtectexec’ (see odpy.common.getODSoftwareDir)
- Returns:
- python.dgbpy.mlapply.reformat(res, applyinfo)¶
For reformatting prediction result type(s)
- Parameters:
res (dict): predictions (labels, probabilities, confidence results)
applyinfo (dict): information from example file to apply model
- Returns:
dict: reformatted equivalence of results if key(s) match (labels, probabilities, confidence results)
- python.dgbpy.mlapply.doApplyFromFile(modelfnm, samples, outsubsel=None)¶
- python.dgbpy.mlapply.doApply(model, info, samples, scaler=None, applyinfo=None, batchsize=None)¶
Applies a trained machine learning model on any platform for any workflow
- Parameters:
model (object): trained model in hdf5 format
info (dict): info from example file
samples (ndarray): input features to model
scaler (obj): scaler for scaling if any
applyinfo (dict): information from example file to apply model
batchsize (int): data batch size
- Returns:
dict: prediction results (reformatted, see dgbpy.mlapply.reformat)
- python.dgbpy.mlapply.numpyApply(samples)¶
- python.dgbpy.mlapply.inputCount(infos, raw=False, dsets=None)¶
Gets count of input images (train and validation)
- Parameters:
infos (dict): info from example file
- raw (bool): set to True to return total input count,
False for otherwise (train and validation split counts)
dsets (dict): dataset
- Returns:
(dict, list): count of input images
- Notes:
a list of dictionary (train and validation input images counts) when
raw=False. A dictionary of the total survey input images count
- python.dgbpy.mlapply.inputCountList(infos, dsetslist)¶
- python.dgbpy.mlapply.inputCount_(dsets)¶
Gets count of input images
dsets (dict): dataset
- Returns:
(dict): count of total input images
- python.dgbpy.mlapply.split(arrays, ratio)¶