Class MillerUpdatingRegression

  • All Implemented Interfaces:
    UpdatingMultipleLinearRegression

    public class MillerUpdatingRegression
    extends java.lang.Object
    implements UpdatingMultipleLinearRegression
    This class is a concrete implementation of the UpdatingMultipleLinearRegression interface.

    The algorithm is described in:

     Algorithm AS 274: Least Squares Routines to Supplement Those of Gentleman
     Author(s): Alan J. Miller
     Source: Journal of the Royal Statistical Society.
     Series C (Applied Statistics), Vol. 41, No. 2
     (1992), pp. 458-478
     Published by: Blackwell Publishing for the Royal Statistical Society
     Stable URL: http://www.jstor.org/stable/2347583 

    This method for multiple regression forms the solution to the OLS problem by updating the QR decomposition as described by Gentleman.

    Since:
    3.0
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private double[] d
      diagonals of cross products matrix
      private double epsilon
      zero tolerance
      private boolean hasIntercept
      boolean flag whether a regression constant is added
      private boolean[] lindep
      flags for variables with linear dependency problems
      private long nobs
      number of observations entered
      private int nvars
      number of variables in regression
      private double[] r
      the off diagonal portion of the R matrix
      private double[] rhs
      the elements of the R`Y
      private double[] rss
      residual sum of squares for all nested regressions
      private boolean rss_set
      has rss been called?
      private double sserr
      sum of squared errors of largest regression
      private double sumsqy
      summation of squared Y values
      private double sumy
      summation of Y variable
      private double[] tol
      the tolerance for each of the variables
      private boolean tol_set
      has the tolerance setting method been called
      private int[] vorder
      order of the regressors
      private double[] work_sing
      workspace for singularity method
      private double[] work_tolset
      scratch space for tolerance calc
      private double[] x_sing
      singular x values
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private MillerUpdatingRegression()
      Set the default constructor to private access to prevent inadvertent instantiation
        MillerUpdatingRegression​(int numberOfVariables, boolean includeConstant)
      Primary constructor for the MillerUpdatingRegression.
        MillerUpdatingRegression​(int numberOfVariables, boolean includeConstant, double errorTolerance)
      This is the augmented constructor for the MillerUpdatingRegression class.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void addObservation​(double[] x, double y)
      Adds an observation to the regression model.
      void addObservations​(double[][] x, double[] y)
      Adds multiple observations to the model.
      void clear()
      As the name suggests, clear wipes the internals and reorders everything in the canonical order.
      private double[] cov​(int nreq)
      Calculates the cov matrix assuming only the first nreq variables are included in the calculation.
      double getDiagonalOfHatMatrix​(double[] row_data)
      Gets the diagonal of the Hat matrix also known as the leverage matrix.
      long getN()
      Gets the number of observations added to the regression model.
      int[] getOrderOfRegressors()
      Gets the order of the regressors, useful if some type of reordering has been called.
      double[] getPartialCorrelations​(int in)
      In the original algorithm only the partial correlations of the regressors is returned to the user.
      boolean hasIntercept()
      A getter method which determines whether a constant is included.
      private void include​(double[] x, double wi, double yi)
      The include method is where the QR decomposition occurs.
      private void inverse​(double[] rinv, int nreq)
      This internal method calculates the inverse of the upper-triangular portion of the R matrix.
      private double[] regcf​(int nreq)
      The regcf method conducts the linear regression and extracts the parameter vector.
      RegressionResults regress()
      Conducts a regression on the data in the model, using all regressors.
      RegressionResults regress​(int numberOfRegressors)
      Conducts a regression on the data in the model, using a subset of regressors.
      RegressionResults regress​(int[] variablesToInclude)
      Conducts a regression on the data in the model, using regressors in array Calling this method will change the internal order of the regressors and care is required in interpreting the hatmatrix.
      private int reorderRegressors​(int[] list, int pos1)
      ALGORITHM AS274 APPL.
      private void singcheck()
      The method which checks for singularities and then eliminates the offending columns.
      private double smartAdd​(double a, double b)
      Adds to number a and b such that the contamination due to numerical smallness of one addend does not corrupt the sum.
      private void ss()
      Calculates the sum of squared errors for the full regression and all subsets in the following manner:
      private void tolset()
      This sets up tolerances for singularity testing.
      private void vmove​(int from, int to)
      ALGORITHM AS274 APPL.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • nvars

        private final int nvars
        number of variables in regression
      • d

        private final double[] d
        diagonals of cross products matrix
      • rhs

        private final double[] rhs
        the elements of the R`Y
      • r

        private final double[] r
        the off diagonal portion of the R matrix
      • tol

        private final double[] tol
        the tolerance for each of the variables
      • rss

        private final double[] rss
        residual sum of squares for all nested regressions
      • vorder

        private final int[] vorder
        order of the regressors
      • work_tolset

        private final double[] work_tolset
        scratch space for tolerance calc
      • nobs

        private long nobs
        number of observations entered
      • sserr

        private double sserr
        sum of squared errors of largest regression
      • rss_set

        private boolean rss_set
        has rss been called?
      • tol_set

        private boolean tol_set
        has the tolerance setting method been called
      • lindep

        private final boolean[] lindep
        flags for variables with linear dependency problems
      • x_sing

        private final double[] x_sing
        singular x values
      • work_sing

        private final double[] work_sing
        workspace for singularity method
      • sumy

        private double sumy
        summation of Y variable
      • sumsqy

        private double sumsqy
        summation of squared Y values
      • hasIntercept

        private boolean hasIntercept
        boolean flag whether a regression constant is added
      • epsilon

        private final double epsilon
        zero tolerance
    • Constructor Detail

      • MillerUpdatingRegression

        private MillerUpdatingRegression()
        Set the default constructor to private access to prevent inadvertent instantiation
      • MillerUpdatingRegression

        public MillerUpdatingRegression​(int numberOfVariables,
                                        boolean includeConstant,
                                        double errorTolerance)
                                 throws ModelSpecificationException
        This is the augmented constructor for the MillerUpdatingRegression class.
        Parameters:
        numberOfVariables - number of regressors to expect, not including constant
        includeConstant - include a constant automatically
        errorTolerance - zero tolerance, how machine zero is determined
        Throws:
        ModelSpecificationException - if numberOfVariables is less than 1
      • MillerUpdatingRegression

        public MillerUpdatingRegression​(int numberOfVariables,
                                        boolean includeConstant)
                                 throws ModelSpecificationException
        Primary constructor for the MillerUpdatingRegression.
        Parameters:
        numberOfVariables - maximum number of potential regressors
        includeConstant - include a constant automatically
        Throws:
        ModelSpecificationException - if numberOfVariables is less than 1
    • Method Detail

      • hasIntercept

        public boolean hasIntercept()
        A getter method which determines whether a constant is included.
        Specified by:
        hasIntercept in interface UpdatingMultipleLinearRegression
        Returns:
        true regression has an intercept, false no intercept
      • getN

        public long getN()
        Gets the number of observations added to the regression model.
        Specified by:
        getN in interface UpdatingMultipleLinearRegression
        Returns:
        number of observations
      • include

        private void include​(double[] x,
                             double wi,
                             double yi)
        The include method is where the QR decomposition occurs. This statement forms all intermediate data which will be used for all derivative measures. According to the miller paper, note that in the original implementation the x vector is overwritten. In this implementation, the include method is passed a copy of the original data vector so that there is no contamination of the data. Additionally, this method differs slightly from Gentleman's method, in that the assumption is of dense design matrices, there is some advantage in using the original gentleman algorithm on sparse matrices.
        Parameters:
        x - observations on the regressors
        wi - weight of the this observation (-1,1)
        yi - observation on the regressand
      • smartAdd

        private double smartAdd​(double a,
                                double b)
        Adds to number a and b such that the contamination due to numerical smallness of one addend does not corrupt the sum.
        Parameters:
        a - - an addend
        b - - an addend
        Returns:
        the sum of the a and b
      • clear

        public void clear()
        As the name suggests, clear wipes the internals and reorders everything in the canonical order.
        Specified by:
        clear in interface UpdatingMultipleLinearRegression
      • tolset

        private void tolset()
        This sets up tolerances for singularity testing.
      • regcf

        private double[] regcf​(int nreq)
                        throws ModelSpecificationException
        The regcf method conducts the linear regression and extracts the parameter vector. Notice that the algorithm can do subset regression with no alteration.
        Parameters:
        nreq - how many of the regressors to include (either in canonical order, or in the current reordered state)
        Returns:
        an array with the estimated slope coefficients
        Throws:
        ModelSpecificationException - if nreq is less than 1 or greater than the number of independent variables
      • singcheck

        private void singcheck()
        The method which checks for singularities and then eliminates the offending columns.
      • ss

        private void ss()
        Calculates the sum of squared errors for the full regression and all subsets in the following manner:
         rss[] ={
         ResidualSumOfSquares_allNvars,
         ResidualSumOfSquares_FirstNvars-1,
         ResidualSumOfSquares_FirstNvars-2,
         ..., ResidualSumOfSquares_FirstVariable} 
      • cov

        private double[] cov​(int nreq)
        Calculates the cov matrix assuming only the first nreq variables are included in the calculation. The returned array contains a symmetric matrix stored in lower triangular form. The matrix will have ( nreq + 1 ) * nreq / 2 elements. For illustration
         cov =
         {
          cov_00,
          cov_10, cov_11,
          cov_20, cov_21, cov22,
          ...
         } 
        Parameters:
        nreq - how many of the regressors to include (either in canonical order, or in the current reordered state)
        Returns:
        an array with the variance covariance of the included regressors in lower triangular form
      • inverse

        private void inverse​(double[] rinv,
                             int nreq)
        This internal method calculates the inverse of the upper-triangular portion of the R matrix.
        Parameters:
        rinv - the storage for the inverse of r
        nreq - how many of the regressors to include (either in canonical order, or in the current reordered state)
      • getPartialCorrelations

        public double[] getPartialCorrelations​(int in)
        In the original algorithm only the partial correlations of the regressors is returned to the user. In this implementation, we have
         corr =
         {
           corrxx - lower triangular
           corrxy - bottom row of the matrix
         }
         Replaces subroutines PCORR and COR of:
         ALGORITHM AS274  APPL. STATIST. (1992) VOL.41, NO. 2 

        Calculate partial correlations after the variables in rows 1, 2, ..., IN have been forced into the regression. If IN = 1, and the first row of R represents a constant in the model, then the usual simple correlations are returned.

        If IN = 0, the value returned in array CORMAT for the correlation of variables Xi & Xj is:

         sum ( Xi.Xj ) / Sqrt ( sum (Xi^2) . sum (Xj^2) )

        On return, array CORMAT contains the upper triangle of the matrix of partial correlations stored by rows, excluding the 1's on the diagonal. e.g. if IN = 2, the consecutive elements returned are: (3,4) (3,5) ... (3,ncol), (4,5) (4,6) ... (4,ncol), etc. Array YCORR stores the partial correlations with the Y-variable starting with YCORR(IN+1) = partial correlation with the variable in position (IN+1).

        Parameters:
        in - how many of the regressors to include (either in canonical order, or in the current reordered state)
        Returns:
        an array with the partial correlations of the remainder of regressors with each other and the regressand, in lower triangular form
      • vmove

        private void vmove​(int from,
                           int to)
        ALGORITHM AS274 APPL. STATIST. (1992) VOL.41, NO. 2. Move variable from position FROM to position TO in an orthogonal reduction produced by AS75.1.
        Parameters:
        from - initial position
        to - destination
      • reorderRegressors

        private int reorderRegressors​(int[] list,
                                      int pos1)
        ALGORITHM AS274 APPL. STATIST. (1992) VOL.41, NO. 2

        Re-order the variables in an orthogonal reduction produced by AS75.1 so that the N variables in LIST start at position POS1, though will not necessarily be in the same order as in LIST. Any variables in VORDER before position POS1 are not moved. Auxiliary routine called: VMOVE.

        This internal method reorders the regressors.

        Parameters:
        list - the regressors to move
        pos1 - where the list will be placed
        Returns:
        -1 error, 0 everything ok
      • getDiagonalOfHatMatrix

        public double getDiagonalOfHatMatrix​(double[] row_data)
        Gets the diagonal of the Hat matrix also known as the leverage matrix.
        Parameters:
        row_data - returns the diagonal of the hat matrix for this observation
        Returns:
        the diagonal element of the hatmatrix
      • getOrderOfRegressors

        public int[] getOrderOfRegressors()
        Gets the order of the regressors, useful if some type of reordering has been called. Calling regress with int[]{} args will trigger a reordering.
        Returns:
        int[] with the current order of the regressors
      • regress

        public RegressionResults regress​(int numberOfRegressors)
                                  throws ModelSpecificationException
        Conducts a regression on the data in the model, using a subset of regressors.
        Parameters:
        numberOfRegressors - many of the regressors to include (either in canonical order, or in the current reordered state)
        Returns:
        RegressionResults the structure holding all regression results
        Throws:
        ModelSpecificationException - - thrown if number of observations is less than the number of variables or number of regressors requested is greater than the regressors in the model
      • regress

        public RegressionResults regress​(int[] variablesToInclude)
                                  throws ModelSpecificationException
        Conducts a regression on the data in the model, using regressors in array Calling this method will change the internal order of the regressors and care is required in interpreting the hatmatrix.
        Specified by:
        regress in interface UpdatingMultipleLinearRegression
        Parameters:
        variablesToInclude - array of variables to include in regression
        Returns:
        RegressionResults the structure holding all regression results
        Throws:
        ModelSpecificationException - - thrown if number of observations is less than the number of variables, the number of regressors requested is greater than the regressors in the model or a regressor index in regressor array does not exist