mlpack 3.4.2
The FunctionType policy in mlpack

Overview

To represent the various types of loss functions encountered in machine learning problems, mlpack provides the FunctionType template parameter in the optimizer interface. The various optimizers available in the core library rely on this policy to gain the necessary information required by the optimizing algorithm.

The FunctionType template parameter required by the Optimizer class can have additional requirements imposed on it, depending on the type of optimizer used.

Interface requirements

The most basic requirements for the FunctionType parameter are the implementations of two public member functions, with the following interface and semantics

// Evaluate the loss function at the given coordinates.
double Evaluate(const arma::mat& coordinates);
// Evaluate the gradient at the given coordinates, where 'gradient' is an
// output parameter for the required gradient.
void Gradient(const arma::mat& coordinates, arma::mat& gradient);

Optimizers like SGD and RMSProp require a DecomposableFunctionType having the following requirements

// Return the number of functions. In a data-dependent function, this would
// return the number of points in the dataset.
size_t NumFunctions();
// Evaluate the 'i' th loss function. For example, for a data-dependent
// function, Evaluate(coordinates, 0) should evaluate the loss function at the
// first point in the dataset.
double Evaluate(const arma::mat& coordinates, const size_t i);
// Evaluate the gradient of the 'i' th loss function at the given coordinates,
// where 'gradient' is an output parameter for the required gradient.
void Gradient(const arma::mat& coordinates, const size_t i, arma::mat& gradient);

ParallelSGD optimizer requires a SparseFunctionType interface. SparseFunctionType requires the gradient to be in a sparse matrix (arma::sp_mat), as ParallelSGD, implemented with the HOGWILD! scheme of unsynchronised updates, is expected to be relevant only in situations where the individual gradients are sparse. So, the interface requires function with the following signatures

// Return the number of functions. In a data-dependent function, this would
// return the number of points in the dataset.
size_t NumFunctions();
// Evaluate the loss function at the given coordinates.
double Evaluate(const arma::mat& coordinates);
// Evaluate the (sparse) gradient of the 'i' th loss function at the given
// coordinates, where 'gradient' is an output parameter for the required
// gradient.
void Gradient(const arma::mat& coordinates, const size_t i, arma::sp_mat& gradient);

The SCD optimizer requires a ResolvableFunctionType interface, to calculate partial gradients with respect to individual features. The optimizer requires the decision variable to be arranged in a particular fashion to allow for disjoint updates. The features should be arranged columnwise in the decision variable. For example, in SoftmaxRegressionFunction the decision variable has size numClasses x featureSize (+ 1 if an intercept also needs to be fit). Similarly, for LogisticRegression, the decision variable is a row vector, with the number of columns determined by the dimensionality of the dataset.

The interface expects the following member functions from the function class

// Return the number of features in the decision variable.
size_t NumFeatures();
// Evaluate the loss function at the given coordinates.
double Evaluate(const arma::mat& coordinates);
// Evaluate the partial gradient of the loss function with respect to the 'j' th
// coordinate at the given coordinates, where 'gradient' is an output parameter
// for the required gradient. The 'gradient' matrix is supposed to be non-zero
// in the jth column, which contains the relevant partial gradient.
void PartialGradient(const arma::mat& coordinates, const size_t j, arma::sp_mat& gradient);