:py:mod:`python.dgbpy.mlapply`
==============================

.. py:module:: python.dgbpy.mlapply


Module Contents
---------------


Functions
~~~~~~~~~

.. autoapisummary::

   python.dgbpy.mlapply.computeScaler_
   python.dgbpy.mlapply.computeChunkedScaler_
   python.dgbpy.mlapply.computeScaler
   python.dgbpy.mlapply.getScaledTrainingData
   python.dgbpy.mlapply.getInputList
   python.dgbpy.mlapply.getScaledTrainingDataByInfo
   python.dgbpy.mlapply.getScaler
   python.dgbpy.mlapply.getNewScaler
   python.dgbpy.mlapply.transform
   python.dgbpy.mlapply.doTrain
   python.dgbpy.mlapply.reformat
   python.dgbpy.mlapply.doApplyFromFile
   python.dgbpy.mlapply.doApply
   python.dgbpy.mlapply.numpyApply
   python.dgbpy.mlapply.inputCount
   python.dgbpy.mlapply.inputCountList
   python.dgbpy.mlapply.inputCount_
   python.dgbpy.mlapply.split


Attributes
~~~~~~~~~~

.. autoapisummary::

   python.dgbpy.mlapply.TrainType


.. py:data:: TrainType
   

.. py:function:: computeScaler_(datasets, infos, scalebyattrib)

   Computes scaler

   Parameters:
     * datasets (dict): dataset
     * infos (dict): information about example file
     * scalebyattrib (bool): 


.. py:function:: computeChunkedScaler_(datasets, infos, groupnm, scalebyattrib)


.. py:function:: computeScaler(infos, scalebyattrib, force=False)


.. py:function:: getScaledTrainingData(filenm, flatten=False, scale=True, force=False, nbchunks=1, split=None)

   Gets scaled training data

   Parameters:
     * filenm (str): path to file
     * flatten (bool):
     * scale (bool or iter): 
     * nbchunks (int): number of data chunks to be created
     * split (float): size of validation data (between 0-1)


.. py:function:: getInputList(datasets)

   Parameters:
     * datasets (dict): dataset from example file 

   Returns:
     * dict: 


.. py:function:: getScaledTrainingDataByInfo(infos, flatten=False, scale=True, ichunk=0)

   Gets scaled training data

   Parameters:
     * infos (dict): information about example file
     * flatten (bool):
     * scale (bool): defaults to True, a scaling object is applied to returned data, otherwise if False is specified
     * ichunk (int): number of data chunks to be created

   Returns:
     * dict: of training data with x_train, y_train, x_validation, y_validation, infos as keys.


.. py:function:: getScaler(x_train, byattrib=True)

   Gets scaler object for data scaling

   Parameters:
     * x_train (array): data to be scaled
     * byattrib (bool): True if scaling should be done by individual attribute 
                        present in data, False if otherwise

   Returns:
     * object: StandardScaler object fitted on data (from sklearn.preprocessing)                       


.. py:function:: getNewScaler(mean, scale)

   Gets new scaler object

   Parameters:
     * mean (ndarray of shape (n_features,) or None): mean value to be used for scaling
     * scale ndarray of shape (n_features,) or None: Per feature relative scaling of the 
       data to achieve zero mean and unit variance (fromm sklearn docs)

   Returns:
     * object: scaler (an instance of sklearn.preprocessing..StandardScaler())


.. py:function:: transform(x_train, scaler)


.. py:function:: doTrain(examplefilenm, platform=dgbkeys.kerasplfnm, type=TrainType.New, params=None, outnm=dgbkeys.modelnm, logdir=None, clearlogs=False, modelin=None, args=None)

   Method to perform a training job using any platform and for any workflow
       (trained model is also saved)

   Parameters:
     * examplefilenm (str): file name/path to example file in hdf5 format
     * platform (str): machine learning platform choice (options are; keras, scikit-learn, torch)
     * type (str): type of training; new or transfer, or continue (Resume)
     * params (dict): machine learning hyperparameters or parameters options
     * outnm (str): name to save trained model as
     * logdir (str): the path of the directory where to save the log
                     files to be parsed by TensorBoard (only applicable 
                     for the keras platform)
     * clearlogs (bool): clears previous logs if any when set to True 
     * modelin (str): model file path/name in hdf5 format
     * args (dict, optional):
       Dictionary with the members 'dtectdata' and 'survey' as 
       single element lists, and/or 'dtectexec' (see odpy.common.getODSoftwareDir)

   Returns:
     * 
     

.. py:function:: reformat(res, applyinfo)

   For reformatting prediction result type(s)

   Parameters:
     * res (dict): predictions (labels, probabilities, confidence results)
     * applyinfo (dict): information from example file to apply model

   Returns:
     * dict: reformatted equivalence of results if key(s) match (labels, probabilities, confidence results)


.. py:function:: doApplyFromFile(modelfnm, samples, outsubsel=None)

     
.. py:function:: doApply(model, info, samples, scaler=None, applyinfo=None, batchsize=None)

   Applies a trained machine learning model on any platform for any workflow

   Parameters:
     * model (object): trained model in hdf5 format
     * info (dict): info from example file
     * samples (ndarray): input features to model
     * scaler (obj): scaler for scaling if any
     * applyinfo (dict): information from example file to apply model
     * batchsize (int): data batch size

   Returns:
     * dict: prediction results (reformatted, see dgbpy.mlapply.reformat)


.. py:function:: numpyApply(samples)


.. py:function:: inputCount(infos, raw=False, dsets=None)

   Gets count of input images (train and validation)

   Parameters:
     * infos (dict): info from example file
     * raw (bool): set to True to return total input count, 
                   False for otherwise (train and validation split counts)
     * dsets (dict): dataset

   Returns:
     * (dict, list): count of input images

   Notes:
     * a list of dictionary (train and validation input images counts) when
      raw=False. A dictionary of the total survey input images count


.. py:function:: inputCountList(infos, dsetslist)


.. py:function:: inputCount_(dsets)

   Gets count of input images 

     * dsets (dict): dataset

   Returns:
     * (dict): count of total input images


.. py:function:: split(arrays, ratio)