I know that the support for linear models with multiple LHS is limited. But when it is possible to run a function on an "mlm" object, I would expect the results to be trusty. When using rstudent
, strange results are produced. Is this a bug or is there some other explanation?
In the example below fittedA
and fittedB
are identical, but in the case of rstudent
the 2nd column differs.
y <- matrix(rnorm(20), 10, 2)
x <- 1:10
fittedA <- fitted(lm(y ~ x))
fittedB <- cbind(fitted(lm(y[, 1] ~ x)), fitted(lm(y[, 2] ~ x)))
rstudentA <- rstudent(lm(y ~ x))
rstudentB <- cbind(rstudent(lm(y[, 1] ~ x)), rstudent(lm(y[, 2] ~ x)))
Thank you @李哲源; see also https://www.r-project.org/bugs.html on how to report bugs ... where they are noticed better by the R core team. Also, there, we could give better credit for patches..
Als note that R's source code - notably it's development version - is always available via svn ("subversion") or also a webbrowser at https://svn.r-project.org/R/trunk/
Both,
cooks.distance.lm()
's andrstandard.lm()
's source code is in https://svn.r-project.org/R/trunk/src/library/stats/R/lm.influence.R .... all for a next time. Your proposed small code changes viaouter()
are good enough of course.Thank you very much for the thorough analysis and your well proposed bug fix!
Setup
As you've observed, only the result for the first response is correctly returned by
rstandard(fit)
.Why
rstudent
fails on "mlm"The thing is, there is no "mlm" method for
rstudent
.When you call
rstudent(fit)
, the S3 method dispatching mechanism findsrstudent.lm
instead, becauseinherits(fit, "lm")
isTRUE
. Unfortunately,stats:::rstudent.lm
is not doing the correct computation for an "mlm" model.lm.influence
does not give the correctsigma
for "mlm". The underlying C routineC_influence
only computessigma
for an "lm". If you givelm.influence
an "mlm", only the result for the first response variable is returned.Obviously for an "mlm" the
sigma
should be a matrix. Now given this incorrectsigma
, "recycling rule" is applied behind the"/"
in the following line ofstats:::rstudent.lm
, because theres
(residuals) on its left is a matrix but the stuff on its right is a vector.Effectively, computational result is only correct for the first response variable; all the rest of response variables would use the wrong
sigma
.R core team needs to patch a number of diagnostic functions
Note that almost all functions listed in doc page
?influence.measures
are buggy for "mlm". They should have issued an warning saying that "mlm" methods are not implemented yet.lm.influnce
needs be patched at C-level. As soon as this is done,rstudent.lm
would work correctly on "mlm".Ohter functions can be easily patched at R-level, for example,
stats:::cooks.distance.lm
andstats:::rstandard
. At the moment (R 3.5.1) they are:and they can be patched (by using
outer
) withA quick test: