python.dgbpy.mlapply

Module Contents

Functions

computeScaler_(datasets, infos, scalebyattrib)

Computes scaler

computeChunkedScaler_(datasets, infos, groupnm, scalebyattrib)

computeScaler(infos, scalebyattrib, force=False)

getScaledTrainingData(filenm, flatten=False, scale=True, force=False, nbchunks=1, split=None)

Gets scaled training data

getInputList(datasets)

Parameters:

getScaledTrainingDataByInfo(infos, flatten=False, scale=True, ichunk=0)

Gets scaled training data

getScaler(x_train, byattrib=True)

Gets scaler object for data scaling

getNewScaler(mean, scale)

Gets new scaler object

transform(x_train, scaler)

doTrain(examplefilenm, platform=dgbkeys.kerasplfnm, type=TrainType.New, params=None, outnm=dgbkeys.modelnm, logdir=None, clearlogs=False, modelin=None, args=None)

Method to perform a training job using any platform and for any workflow

reformat(res, applyinfo)

For reformatting prediction result type(s)

doApplyFromFile(modelfnm, samples, outsubsel=None)

doApply(model, info, samples, scaler=None, applyinfo=None, batchsize=None)

Applies a trained machine learning model on any platform for any workflow

numpyApply(samples)

inputCount(infos, raw=False, dsets=None)

Gets count of input images (train and validation)

inputCountList(infos, dsetslist)

inputCount_(dsets)

Gets count of input images

split(arrays, ratio)

Attributes

TrainType

python.dgbpy.mlapply.TrainType
python.dgbpy.mlapply.computeScaler_(datasets, infos, scalebyattrib)

Computes scaler

Parameters:
  • datasets (dict): dataset

  • infos (dict): information about example file

  • scalebyattrib (bool):

python.dgbpy.mlapply.computeChunkedScaler_(datasets, infos, groupnm, scalebyattrib)
python.dgbpy.mlapply.computeScaler(infos, scalebyattrib, force=False)
python.dgbpy.mlapply.getScaledTrainingData(filenm, flatten=False, scale=True, force=False, nbchunks=1, split=None)

Gets scaled training data

Parameters:
  • filenm (str): path to file

  • flatten (bool):

  • scale (bool or iter):

  • nbchunks (int): number of data chunks to be created

  • split (float): size of validation data (between 0-1)

python.dgbpy.mlapply.getInputList(datasets)
Parameters:
  • datasets (dict): dataset from example file

Returns:
  • dict:

python.dgbpy.mlapply.getScaledTrainingDataByInfo(infos, flatten=False, scale=True, ichunk=0)

Gets scaled training data

Parameters:
  • infos (dict): information about example file

  • flatten (bool):

  • scale (bool): defaults to True, a scaling object is applied to returned data, otherwise if False is specified

  • ichunk (int): number of data chunks to be created

Returns:
  • dict: of training data with x_train, y_train, x_validation, y_validation, infos as keys.

python.dgbpy.mlapply.getScaler(x_train, byattrib=True)

Gets scaler object for data scaling

Parameters:
  • x_train (array): data to be scaled

  • byattrib (bool): True if scaling should be done by individual attribute

    present in data, False if otherwise

Returns:
  • object: StandardScaler object fitted on data (from sklearn.preprocessing)

python.dgbpy.mlapply.getNewScaler(mean, scale)

Gets new scaler object

Parameters:
  • mean (ndarray of shape (n_features,) or None): mean value to be used for scaling

  • scale ndarray of shape (n_features,) or None: Per feature relative scaling of the data to achieve zero mean and unit variance (fromm sklearn docs)

Returns:
  • object: scaler (an instance of sklearn.preprocessing..StandardScaler())

python.dgbpy.mlapply.transform(x_train, scaler)
python.dgbpy.mlapply.doTrain(examplefilenm, platform=dgbkeys.kerasplfnm, type=TrainType.New, params=None, outnm=dgbkeys.modelnm, logdir=None, clearlogs=False, modelin=None, args=None)
Method to perform a training job using any platform and for any workflow

(trained model is also saved)

Parameters:
  • examplefilenm (str): file name/path to example file in hdf5 format

  • platform (str): machine learning platform choice (options are; keras, scikit-learn, torch)

  • type (str): type of training; new or transfer, or continue (Resume)

  • params (dict): machine learning hyperparameters or parameters options

  • outnm (str): name to save trained model as

  • logdir (str): the path of the directory where to save the log

    files to be parsed by TensorBoard (only applicable for the keras platform)

  • clearlogs (bool): clears previous logs if any when set to True

  • modelin (str): model file path/name in hdf5 format

  • args (dict, optional): Dictionary with the members ‘dtectdata’ and ‘survey’ as single element lists, and/or ‘dtectexec’ (see odpy.common.getODSoftwareDir)

Returns:
python.dgbpy.mlapply.reformat(res, applyinfo)

For reformatting prediction result type(s)

Parameters:
  • res (dict): predictions (labels, probabilities, confidence results)

  • applyinfo (dict): information from example file to apply model

Returns:
  • dict: reformatted equivalence of results if key(s) match (labels, probabilities, confidence results)

python.dgbpy.mlapply.doApplyFromFile(modelfnm, samples, outsubsel=None)
python.dgbpy.mlapply.doApply(model, info, samples, scaler=None, applyinfo=None, batchsize=None)

Applies a trained machine learning model on any platform for any workflow

Parameters:
  • model (object): trained model in hdf5 format

  • info (dict): info from example file

  • samples (ndarray): input features to model

  • scaler (obj): scaler for scaling if any

  • applyinfo (dict): information from example file to apply model

  • batchsize (int): data batch size

Returns:
  • dict: prediction results (reformatted, see dgbpy.mlapply.reformat)

python.dgbpy.mlapply.numpyApply(samples)
python.dgbpy.mlapply.inputCount(infos, raw=False, dsets=None)

Gets count of input images (train and validation)

Parameters:
  • infos (dict): info from example file

  • raw (bool): set to True to return total input count,

    False for otherwise (train and validation split counts)

  • dsets (dict): dataset

Returns:
  • (dict, list): count of input images

Notes:
  • a list of dictionary (train and validation input images counts) when

raw=False. A dictionary of the total survey input images count

python.dgbpy.mlapply.inputCountList(infos, dsetslist)
python.dgbpy.mlapply.inputCount_(dsets)

Gets count of input images

  • dsets (dict): dataset

Returns:
  • (dict): count of total input images

python.dgbpy.mlapply.split(arrays, ratio)