SPSS creating a loop for a multiple regression ove

2019-07-20 15:56发布

For my master thesis I have to use SPSS to analyse my data. Actually I thought that I don't have to deal with very difficult statistical issues, which is still true regarding the concepts of my analysis. BUT the problem is now that in order to create my dependent variable I need to use the syntax editor/ programming in general and I have no experience in this area at all. I hope you can help me in the process of creating my syntax.

I have in total approximately 900 companies with 6 year observations. For all of these companies I need the predicted values of the following company-specific regression:

Y= ß1*X1+ß2*X2+ß3*X3 + error

(I know, the ß won t very likely be significant, but this is nothing to worry about in my thesis, it will be mentioned in the limitations though). So far my data are ordered in the following way

COMPANY  YEAR X1 X2 X3

1       2002

2       2002

1       2003

2       2003

But I could easily change the order, e.g. in

1

1

2

2 etc.

Ok let's say I have rearranged the data: what I need now is that SPSS computes for each company the specific ß and returns the output in one column (the predicted values with those ß multiplied by the specific X in each row). So I guess what I need is a loop that does a multiple linear regression for 6 rows for each of the 939 companies, am I right?

As I said I have no experience at all, so every hint is valuable for me.

Thank you in advance,

Janina.

2条回答
姐就是有狂的资本
2楼-- · 2019-07-20 16:31

You can use SPLIT FILE to estimate the regressions specific for each company, example below. Note that one would likely want to consider other panel data models, and assess whether there is autocorrelation in the residuals. (This is IMO a useful approach though for exploratory analysis of multi-level models.)

The example declares a new dataset to pipe the regression estimates to (see the OUTFILE subcommand on REGRESSION) and suppresses the other tables (with 900+ tables much of the time is spent rendering the output). If you need other statistics either omit the OMS that suppresses the tables, or tweak it to only show the tables you want. (You can use OMS to pipe other results to other datasets as well.)

************************************************************.
*Making Fake data.
SET SEED 10.
INPUT PROGRAM.
LOOP #Comp = 1 to 1000.
COMPUTE #R1 = RV.NORMAL(10,2).
COMPUTE #R2 = RV.NORMAL(-3,1).
COMPUTE #R3 = RV.NORMAL(0,5).
  LOOP Year = 2003 to 2008.
    COMPUTE Company = #Comp.
    COMPUTE Rand1 = #R1.
    COMPUTE Rand2 = #R2.
    COMPUTE Rand3 = #R3.
    END CASE.
  END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME Companies.
COMPUTE x1 = RV.NORMAL(0,1).
COMPUTE x2 = RV.NORMAL(0,1).
COMPUTE x3 = RV.NORMAL(0,1).
COMPUTE y = Rand1*x1 + Rand2*x2 + Rand3*x3 + RV.NORMAL(0,1).
FORMATS Company Year (F4.0).

*Now sorting cases by Company and Year, then using SPLIT file to estimate 
*the regression.
SORT CASES BY Company Year.

*Declare new set and have OMS suppress the other results.
DATASET DECLARE CoeffTable.
OMS 
  /SELECT TABLES
  /IF COMMANDS = 'Regression'
  /DESTINATION VIEWER = NO.
*Now split file to get the coefficients.
SPLIT FILE BY Company.
REGRESSION
  /DEPENDENT y
  /METHOD=ENTER x1 x2 x3
  /SAVE PRED (CompSpePred)
  /OUTFILE = COVB  ('CoeffTable').
SPLIT FILE OFF.
OMSEND.
************************************************************.
查看更多
Summer. ? 凉城
3楼-- · 2019-07-20 16:35

Bear in mind that with only six observations per company and three (or 4 if you also have a constant term) coefficients to estimate, the coefficient estimates are likely to be very imprecise. You might want to consider whether companies can be pooled at least in part.

查看更多
登录 后发表回答