Re: mutiple regression help

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance




happycow wrote:

When doing a multiple regression in excel, what are the meaning of these
out puts (that's are all in the same table)


the fist column  has my dependent variable which is labeled here
"Intercept"  and the independent variables, X2, X3, X4, X5

the second columns labeled  "Coefficients" i think this column has the
slope values of each independent variable; X2, X3, X4, and X5. These
slope variables are in relation to all the other variables, so the
slope of X2 is effected by by X3, X4 and X5. These slope values are the
measure of how each independent variable effects the dependent variable.
I'm guessing this allows me to predict where added data will go in the
correlation. So for example if I want to predict how a film will do (my
regression has to do with film gross) according to this data, i would
multiple my variables from that movie (budget (X2), first weekend gross
(X3), users ratings(X4) and MPPA rating (X5)) to the corresponding
Coefficients values in this column. If what i am saying is right (or at
least partially right), i don't know what the value of Intercept is for,
since is from the dependent variable, it shouldn't have a slope value.


The predicted value at a given point (x1,x2,...x5) is
  c0 + x1*c1 + x2*c2 + ... + x5*c5
where c0 is the intercept and c1,...c5 are the slope coefficients.

The second column which is labeled "Standard Error" I'm guessing (if
what I am saying above the coefficients values are right) is the
accuracy of the predications that can be made. I'm guessing the larger
the number the bigger the error.


Yes.


The fourth column which is labeled "t Stat" i have no clue what it
means and how it contributes to my regression. I'm thinking it's some
type of testing but i don't understand what it's testing and why.


The t statistic is computed as the coefficient divided by its standard error. Small values indicate that the particular coefficient may not be needed in the model. "Small" is generally defined in terms of p-values.


The fifth column which is labeled "P-value" is again something I don't
understand. I think it has to do something with "t-Stat". My other
theory is that it has to do something with probability. I really don't
know though.


Both guesses are hitting around the issue. If a particular coefficient does not belong in the model (the true value is zero, so the observed value is due to random variation), then the p-value is the probability of observing by chance a coefficient as large as occurred with this data set. Thus the smaller the p-value, the greater the likelihood that a coefficient is really needed. A commonly used criteria is to assume that if p<0.05, then there is strong evidence that the coefficient is needed.


The next two columns labeled "Lower 95%" and "Upper 95%", i believe
this is the limits of my correlations. I think that this allows one to
say that "i am 95% sure that the predicted data that lies between these
lowers and uppers can be predicted by the accuracy of my the values in
my "coefficients" column.


The correct interpretation is that you are 95% confident that the interval (Lower to Upper) contains the true value for the coefficient. Note that the interval is random, while the coefficient is not (it is merely unknown). In particular, for a given data set, the interval either does or does not contain the true value (although you don't know which is true). Thus your confidence is in the procedure that generated the interval, not in the specific interval generated from the specific data set. It is a subtle concept that is often misunderstood.


I also am wondering about the graph outputs, the first graph "Line Fit
Plot" outputs 4 scatter diagrams for each of my 4 independent
variables. the graphs looks like their comparing my dependent variable
(on the y axis) to a independent variable on the x axis. Is this just
showing the correlation and relationship of each independent  variable
to the dependent variable. For each individual diagram, Is the
comparison being made and liner relationship (the direction the lines
seem to be going; positive, negative or none) based on just the
independents variable and the dependent variable, or is the independent
variable's slope taking into account the other 3 independent variables?


More or less.

The second graph; the "Residual Plot Graph", does this show the measure
of stand error for each point? and the closer to 0 a point gets the
lesser the error?

Residuals are observed values minus predicted values. If the model is correct, each residual plot should appear to be uniformly distributed. If there is a systematic pattern in one or more residual plots, then there the model is probably inadequate.


All of these questions deal with standard concepts from any introductory statistics course. I highly recommend that you take such a course or at least read an introductory statistics text, since there is more to understand than is likely to be imparted in a few newsgroup replies.

Jerry

.



Relevant Pages

  • Re: multicollinearity in regression
    ... I could use Analysis of Covariance but 2 of the independent variables ... I'm guessing that in the model with LOGSIZE, the LOGSIZE coefficient is ... multicollinearity: it may be that you can then see a sensible approach ... I always find it helpful to calculate the correlation coefficient ...
    (sci.stat.consult)
  • Re: multicollinearity in regression
    ... I could use Analysis of Covariance but 2 of the independent variables ... related to LOGSIZE variable which has a Variance Proportion of .99. ... I'm guessing that in the model with LOGSIZE, the LOGSIZE coefficient is ... I always find it helpful to calculate the correlation coefficient ...
    (sci.stat.consult)
  • Re: aoctool parametres help
    ... parameters in the aoctool "ANOCOVA Coefficients" window? ... The two numbers you mentioned are intended to help you test whether a slope is significantly different from zero. ... The T value is the ratio of the coefficient to its standard error. ...
    (comp.soft-sys.matlab)
  • Re: multicollinearity in regression
    ... I could use Analysis of Covariance but 2 of the independent variables ... Do I just enter all the variables as Independents into the regression ... I must admit that I don't totally understand this, but I assume that this is suggesting that LOGSIZE is co-linear with another variable. ... The change in the coefficient of the Constant isn't surprising, especially if some of the covariates are distrbuted a long way from zero. ...
    (sci.stat.consult)
  • Re: Multiple Regression w/ Polynomial-in-Y?
    ... > Both w vector and c coefficient unknown/to be estimated ... Yields w-vector and c coefficient ... > terms of independent variables (which I suspect would not be so good ... One drawback is needing to decide which root is ...
    (sci.stat.math)