Class MillerUpdatingRegression
- java.lang.Object
-
- org.apache.commons.math3.stat.regression.MillerUpdatingRegression
-
- All Implemented Interfaces:
UpdatingMultipleLinearRegression
public class MillerUpdatingRegression extends java.lang.Object implements UpdatingMultipleLinearRegression
This class is a concrete implementation of theUpdatingMultipleLinearRegression
interface.The algorithm is described in:
Algorithm AS 274: Least Squares Routines to Supplement Those of Gentleman Author(s): Alan J. Miller Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 41, No. 2 (1992), pp. 458-478 Published by: Blackwell Publishing for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/2347583
This method for multiple regression forms the solution to the OLS problem by updating the QR decomposition as described by Gentleman.
- Since:
- 3.0
-
-
Field Summary
Fields Modifier and Type Field Description private double[]
d
diagonals of cross products matrixprivate double
epsilon
zero toleranceprivate boolean
hasIntercept
boolean flag whether a regression constant is addedprivate boolean[]
lindep
flags for variables with linear dependency problemsprivate long
nobs
number of observations enteredprivate int
nvars
number of variables in regressionprivate double[]
r
the off diagonal portion of the R matrixprivate double[]
rhs
the elements of the R`Yprivate double[]
rss
residual sum of squares for all nested regressionsprivate boolean
rss_set
has rss been called?private double
sserr
sum of squared errors of largest regressionprivate double
sumsqy
summation of squared Y valuesprivate double
sumy
summation of Y variableprivate double[]
tol
the tolerance for each of the variablesprivate boolean
tol_set
has the tolerance setting method been calledprivate int[]
vorder
order of the regressorsprivate double[]
work_sing
workspace for singularity methodprivate double[]
work_tolset
scratch space for tolerance calcprivate double[]
x_sing
singular x values
-
Constructor Summary
Constructors Modifier Constructor Description private
MillerUpdatingRegression()
Set the default constructor to private access to prevent inadvertent instantiationMillerUpdatingRegression(int numberOfVariables, boolean includeConstant)
Primary constructor for the MillerUpdatingRegression.MillerUpdatingRegression(int numberOfVariables, boolean includeConstant, double errorTolerance)
This is the augmented constructor for the MillerUpdatingRegression class.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addObservation(double[] x, double y)
Adds an observation to the regression model.void
addObservations(double[][] x, double[] y)
Adds multiple observations to the model.void
clear()
As the name suggests, clear wipes the internals and reorders everything in the canonical order.private double[]
cov(int nreq)
Calculates the cov matrix assuming only the first nreq variables are included in the calculation.double
getDiagonalOfHatMatrix(double[] row_data)
Gets the diagonal of the Hat matrix also known as the leverage matrix.long
getN()
Gets the number of observations added to the regression model.int[]
getOrderOfRegressors()
Gets the order of the regressors, useful if some type of reordering has been called.double[]
getPartialCorrelations(int in)
In the original algorithm only the partial correlations of the regressors is returned to the user.boolean
hasIntercept()
A getter method which determines whether a constant is included.private void
include(double[] x, double wi, double yi)
The include method is where the QR decomposition occurs.private void
inverse(double[] rinv, int nreq)
This internal method calculates the inverse of the upper-triangular portion of the R matrix.private double[]
regcf(int nreq)
The regcf method conducts the linear regression and extracts the parameter vector.RegressionResults
regress()
Conducts a regression on the data in the model, using all regressors.RegressionResults
regress(int numberOfRegressors)
Conducts a regression on the data in the model, using a subset of regressors.RegressionResults
regress(int[] variablesToInclude)
Conducts a regression on the data in the model, using regressors in array Calling this method will change the internal order of the regressors and care is required in interpreting the hatmatrix.private int
reorderRegressors(int[] list, int pos1)
ALGORITHM AS274 APPL.private void
singcheck()
The method which checks for singularities and then eliminates the offending columns.private double
smartAdd(double a, double b)
Adds to number a and b such that the contamination due to numerical smallness of one addend does not corrupt the sum.private void
ss()
Calculates the sum of squared errors for the full regression and all subsets in the following manner:private void
tolset()
This sets up tolerances for singularity testing.private void
vmove(int from, int to)
ALGORITHM AS274 APPL.
-
-
-
Field Detail
-
nvars
private final int nvars
number of variables in regression
-
d
private final double[] d
diagonals of cross products matrix
-
rhs
private final double[] rhs
the elements of the R`Y
-
r
private final double[] r
the off diagonal portion of the R matrix
-
tol
private final double[] tol
the tolerance for each of the variables
-
rss
private final double[] rss
residual sum of squares for all nested regressions
-
vorder
private final int[] vorder
order of the regressors
-
work_tolset
private final double[] work_tolset
scratch space for tolerance calc
-
nobs
private long nobs
number of observations entered
-
sserr
private double sserr
sum of squared errors of largest regression
-
rss_set
private boolean rss_set
has rss been called?
-
tol_set
private boolean tol_set
has the tolerance setting method been called
-
lindep
private final boolean[] lindep
flags for variables with linear dependency problems
-
x_sing
private final double[] x_sing
singular x values
-
work_sing
private final double[] work_sing
workspace for singularity method
-
sumy
private double sumy
summation of Y variable
-
sumsqy
private double sumsqy
summation of squared Y values
-
hasIntercept
private boolean hasIntercept
boolean flag whether a regression constant is added
-
epsilon
private final double epsilon
zero tolerance
-
-
Constructor Detail
-
MillerUpdatingRegression
private MillerUpdatingRegression()
Set the default constructor to private access to prevent inadvertent instantiation
-
MillerUpdatingRegression
public MillerUpdatingRegression(int numberOfVariables, boolean includeConstant, double errorTolerance) throws ModelSpecificationException
This is the augmented constructor for the MillerUpdatingRegression class.- Parameters:
numberOfVariables
- number of regressors to expect, not including constantincludeConstant
- include a constant automaticallyerrorTolerance
- zero tolerance, how machine zero is determined- Throws:
ModelSpecificationException
- ifnumberOfVariables is less than 1
-
MillerUpdatingRegression
public MillerUpdatingRegression(int numberOfVariables, boolean includeConstant) throws ModelSpecificationException
Primary constructor for the MillerUpdatingRegression.- Parameters:
numberOfVariables
- maximum number of potential regressorsincludeConstant
- include a constant automatically- Throws:
ModelSpecificationException
- ifnumberOfVariables is less than 1
-
-
Method Detail
-
hasIntercept
public boolean hasIntercept()
A getter method which determines whether a constant is included.- Specified by:
hasIntercept
in interfaceUpdatingMultipleLinearRegression
- Returns:
- true regression has an intercept, false no intercept
-
getN
public long getN()
Gets the number of observations added to the regression model.- Specified by:
getN
in interfaceUpdatingMultipleLinearRegression
- Returns:
- number of observations
-
addObservation
public void addObservation(double[] x, double y) throws ModelSpecificationException
Adds an observation to the regression model.- Specified by:
addObservation
in interfaceUpdatingMultipleLinearRegression
- Parameters:
x
- the array with regressor valuesy
- the value of dependent variable given these regressors- Throws:
ModelSpecificationException
- if the length ofx
does not equal the number of independent variables in the model
-
addObservations
public void addObservations(double[][] x, double[] y) throws ModelSpecificationException
Adds multiple observations to the model.- Specified by:
addObservations
in interfaceUpdatingMultipleLinearRegression
- Parameters:
x
- observations on the regressorsy
- observations on the regressand- Throws:
ModelSpecificationException
- ifx
is not rectangular, does not match the length ofy
or does not contain sufficient data to estimate the model
-
include
private void include(double[] x, double wi, double yi)
The include method is where the QR decomposition occurs. This statement forms all intermediate data which will be used for all derivative measures. According to the miller paper, note that in the original implementation the x vector is overwritten. In this implementation, the include method is passed a copy of the original data vector so that there is no contamination of the data. Additionally, this method differs slightly from Gentleman's method, in that the assumption is of dense design matrices, there is some advantage in using the original gentleman algorithm on sparse matrices.- Parameters:
x
- observations on the regressorswi
- weight of the this observation (-1,1)yi
- observation on the regressand
-
smartAdd
private double smartAdd(double a, double b)
Adds to number a and b such that the contamination due to numerical smallness of one addend does not corrupt the sum.- Parameters:
a
- - an addendb
- - an addend- Returns:
- the sum of the a and b
-
clear
public void clear()
As the name suggests, clear wipes the internals and reorders everything in the canonical order.- Specified by:
clear
in interfaceUpdatingMultipleLinearRegression
-
tolset
private void tolset()
This sets up tolerances for singularity testing.
-
regcf
private double[] regcf(int nreq) throws ModelSpecificationException
The regcf method conducts the linear regression and extracts the parameter vector. Notice that the algorithm can do subset regression with no alteration.- Parameters:
nreq
- how many of the regressors to include (either in canonical order, or in the current reordered state)- Returns:
- an array with the estimated slope coefficients
- Throws:
ModelSpecificationException
- ifnreq
is less than 1 or greater than the number of independent variables
-
singcheck
private void singcheck()
The method which checks for singularities and then eliminates the offending columns.
-
ss
private void ss()
Calculates the sum of squared errors for the full regression and all subsets in the following manner:rss[] ={ ResidualSumOfSquares_allNvars, ResidualSumOfSquares_FirstNvars-1, ResidualSumOfSquares_FirstNvars-2, ..., ResidualSumOfSquares_FirstVariable}
-
cov
private double[] cov(int nreq)
Calculates the cov matrix assuming only the first nreq variables are included in the calculation. The returned array contains a symmetric matrix stored in lower triangular form. The matrix will have ( nreq + 1 ) * nreq / 2 elements. For illustrationcov = { cov_00, cov_10, cov_11, cov_20, cov_21, cov22, ... }
- Parameters:
nreq
- how many of the regressors to include (either in canonical order, or in the current reordered state)- Returns:
- an array with the variance covariance of the included regressors in lower triangular form
-
inverse
private void inverse(double[] rinv, int nreq)
This internal method calculates the inverse of the upper-triangular portion of the R matrix.- Parameters:
rinv
- the storage for the inverse of rnreq
- how many of the regressors to include (either in canonical order, or in the current reordered state)
-
getPartialCorrelations
public double[] getPartialCorrelations(int in)
In the original algorithm only the partial correlations of the regressors is returned to the user. In this implementation, we havecorr = { corrxx - lower triangular corrxy - bottom row of the matrix } Replaces subroutines PCORR and COR of: ALGORITHM AS274 APPL. STATIST. (1992) VOL.41, NO. 2
Calculate partial correlations after the variables in rows 1, 2, ..., IN have been forced into the regression. If IN = 1, and the first row of R represents a constant in the model, then the usual simple correlations are returned.
If IN = 0, the value returned in array CORMAT for the correlation of variables Xi & Xj is:
sum ( Xi.Xj ) / Sqrt ( sum (Xi^2) . sum (Xj^2) )
On return, array CORMAT contains the upper triangle of the matrix of partial correlations stored by rows, excluding the 1's on the diagonal. e.g. if IN = 2, the consecutive elements returned are: (3,4) (3,5) ... (3,ncol), (4,5) (4,6) ... (4,ncol), etc. Array YCORR stores the partial correlations with the Y-variable starting with YCORR(IN+1) = partial correlation with the variable in position (IN+1).
- Parameters:
in
- how many of the regressors to include (either in canonical order, or in the current reordered state)- Returns:
- an array with the partial correlations of the remainder of regressors with each other and the regressand, in lower triangular form
-
vmove
private void vmove(int from, int to)
ALGORITHM AS274 APPL. STATIST. (1992) VOL.41, NO. 2. Move variable from position FROM to position TO in an orthogonal reduction produced by AS75.1.- Parameters:
from
- initial positionto
- destination
-
reorderRegressors
private int reorderRegressors(int[] list, int pos1)
ALGORITHM AS274 APPL. STATIST. (1992) VOL.41, NO. 2Re-order the variables in an orthogonal reduction produced by AS75.1 so that the N variables in LIST start at position POS1, though will not necessarily be in the same order as in LIST. Any variables in VORDER before position POS1 are not moved. Auxiliary routine called: VMOVE.
This internal method reorders the regressors.
- Parameters:
list
- the regressors to movepos1
- where the list will be placed- Returns:
- -1 error, 0 everything ok
-
getDiagonalOfHatMatrix
public double getDiagonalOfHatMatrix(double[] row_data)
Gets the diagonal of the Hat matrix also known as the leverage matrix.- Parameters:
row_data
- returns the diagonal of the hat matrix for this observation- Returns:
- the diagonal element of the hatmatrix
-
getOrderOfRegressors
public int[] getOrderOfRegressors()
Gets the order of the regressors, useful if some type of reordering has been called. Calling regress with int[]{} args will trigger a reordering.- Returns:
- int[] with the current order of the regressors
-
regress
public RegressionResults regress() throws ModelSpecificationException
Conducts a regression on the data in the model, using all regressors.- Specified by:
regress
in interfaceUpdatingMultipleLinearRegression
- Returns:
- RegressionResults the structure holding all regression results
- Throws:
ModelSpecificationException
- - thrown if number of observations is less than the number of variables
-
regress
public RegressionResults regress(int numberOfRegressors) throws ModelSpecificationException
Conducts a regression on the data in the model, using a subset of regressors.- Parameters:
numberOfRegressors
- many of the regressors to include (either in canonical order, or in the current reordered state)- Returns:
- RegressionResults the structure holding all regression results
- Throws:
ModelSpecificationException
- - thrown if number of observations is less than the number of variables or number of regressors requested is greater than the regressors in the model
-
regress
public RegressionResults regress(int[] variablesToInclude) throws ModelSpecificationException
Conducts a regression on the data in the model, using regressors in array Calling this method will change the internal order of the regressors and care is required in interpreting the hatmatrix.- Specified by:
regress
in interfaceUpdatingMultipleLinearRegression
- Parameters:
variablesToInclude
- array of variables to include in regression- Returns:
- RegressionResults the structure holding all regression results
- Throws:
ModelSpecificationException
- - thrown if number of observations is less than the number of variables, the number of regressors requested is greater than the regressors in the model or a regressor index in regressor array does not exist
-
-