:py:mod:`python.dgbpy.mlapply` ============================== .. py:module:: python.dgbpy.mlapply Module Contents --------------- Functions ~~~~~~~~~ .. autoapisummary:: python.dgbpy.mlapply.computeScaler_ python.dgbpy.mlapply.computeChunkedScaler_ python.dgbpy.mlapply.computeScaler python.dgbpy.mlapply.getScaledTrainingData python.dgbpy.mlapply.getInputList python.dgbpy.mlapply.getScaledTrainingDataByInfo python.dgbpy.mlapply.getScaler python.dgbpy.mlapply.getNewScaler python.dgbpy.mlapply.transform python.dgbpy.mlapply.doTrain python.dgbpy.mlapply.reformat python.dgbpy.mlapply.doApplyFromFile python.dgbpy.mlapply.doApply python.dgbpy.mlapply.numpyApply python.dgbpy.mlapply.inputCount python.dgbpy.mlapply.inputCountList python.dgbpy.mlapply.inputCount_ python.dgbpy.mlapply.split Attributes ~~~~~~~~~~ .. autoapisummary:: python.dgbpy.mlapply.TrainType .. py:data:: TrainType .. py:function:: computeScaler_(datasets, infos, scalebyattrib) Computes scaler Parameters: * datasets (dict): dataset * infos (dict): information about example file * scalebyattrib (bool): .. py:function:: computeChunkedScaler_(datasets, infos, groupnm, scalebyattrib) .. py:function:: computeScaler(infos, scalebyattrib, force=False) .. py:function:: getScaledTrainingData(filenm, flatten=False, scale=True, force=False, nbchunks=1, split=None) Gets scaled training data Parameters: * filenm (str): path to file * flatten (bool): * scale (bool or iter): * nbchunks (int): number of data chunks to be created * split (float): size of validation data (between 0-1) .. py:function:: getInputList(datasets) Parameters: * datasets (dict): dataset from example file Returns: * dict: .. py:function:: getScaledTrainingDataByInfo(infos, flatten=False, scale=True, ichunk=0) Gets scaled training data Parameters: * infos (dict): information about example file * flatten (bool): * scale (bool): defaults to True, a scaling object is applied to returned data, otherwise if False is specified * ichunk (int): number of data chunks to be created Returns: * dict: of training data with x_train, y_train, x_validation, y_validation, infos as keys. .. py:function:: getScaler(x_train, byattrib=True) Gets scaler object for data scaling Parameters: * x_train (array): data to be scaled * byattrib (bool): True if scaling should be done by individual attribute present in data, False if otherwise Returns: * object: StandardScaler object fitted on data (from sklearn.preprocessing) .. py:function:: getNewScaler(mean, scale) Gets new scaler object Parameters: * mean (ndarray of shape (n_features,) or None): mean value to be used for scaling * scale ndarray of shape (n_features,) or None: Per feature relative scaling of the data to achieve zero mean and unit variance (fromm sklearn docs) Returns: * object: scaler (an instance of sklearn.preprocessing..StandardScaler()) .. py:function:: transform(x_train, scaler) .. py:function:: doTrain(examplefilenm, platform=dgbkeys.kerasplfnm, type=TrainType.New, params=None, outnm=dgbkeys.modelnm, logdir=None, clearlogs=False, modelin=None, args=None) Method to perform a training job using any platform and for any workflow (trained model is also saved) Parameters: * examplefilenm (str): file name/path to example file in hdf5 format * platform (str): machine learning platform choice (options are; keras, scikit-learn, torch) * type (str): type of training; new or transfer, or continue (Resume) * params (dict): machine learning hyperparameters or parameters options * outnm (str): name to save trained model as * logdir (str): the path of the directory where to save the log files to be parsed by TensorBoard (only applicable for the keras platform) * clearlogs (bool): clears previous logs if any when set to True * modelin (str): model file path/name in hdf5 format * args (dict, optional): Dictionary with the members 'dtectdata' and 'survey' as single element lists, and/or 'dtectexec' (see odpy.common.getODSoftwareDir) Returns: * .. py:function:: reformat(res, applyinfo) For reformatting prediction result type(s) Parameters: * res (dict): predictions (labels, probabilities, confidence results) * applyinfo (dict): information from example file to apply model Returns: * dict: reformatted equivalence of results if key(s) match (labels, probabilities, confidence results) .. py:function:: doApplyFromFile(modelfnm, samples, outsubsel=None) .. py:function:: doApply(model, info, samples, scaler=None, applyinfo=None, batchsize=None) Applies a trained machine learning model on any platform for any workflow Parameters: * model (object): trained model in hdf5 format * info (dict): info from example file * samples (ndarray): input features to model * scaler (obj): scaler for scaling if any * applyinfo (dict): information from example file to apply model * batchsize (int): data batch size Returns: * dict: prediction results (reformatted, see dgbpy.mlapply.reformat) .. py:function:: numpyApply(samples) .. py:function:: inputCount(infos, raw=False, dsets=None) Gets count of input images (train and validation) Parameters: * infos (dict): info from example file * raw (bool): set to True to return total input count, False for otherwise (train and validation split counts) * dsets (dict): dataset Returns: * (dict, list): count of input images Notes: * a list of dictionary (train and validation input images counts) when raw=False. A dictionary of the total survey input images count .. py:function:: inputCountList(infos, dsetslist) .. py:function:: inputCount_(dsets) Gets count of input images * dsets (dict): dataset Returns: * (dict): count of total input images .. py:function:: split(arrays, ratio)