How to retrieve data from multiple Stata files?

2020-02-15 04:00发布

I have 53 Stata .dta files each of them is 150 - 200 Mb and contain identical set of variables, but for different years. It is not useful to combine or merge them due to their size .

I need to retrieve some averaged values (percentages etc.) Therefore, I want to create a new Stata file New.dta and write a .do file that would run on that new Stata file in the following way: it should open each of those 53 Stata files, make certain calulations, and store the results in the new Stata file, New.dta.

I am not sure how i can keep two Stata file open simultaneuosly, and how can i store the calculated values?

When I open a second .dta file, how can i make the first one still be open? How can i store the calculated values in the global variable?

3条回答
来,给爷笑一个
2楼-- · 2020-02-15 04:22

What springs to mind here is the use of postfile.

Here is a simple example. First, I set up an example of several datasets. You already have this.

clear

forval i = 1/10 {
    set obs 100
    gen foo = `i' * runiform()
    save test`i'
    clear
}

Now I set up postfile. I need to set up a handle, what variables will be used, and what file will be used. Although I am using a numeric variable to hold file identifiers, it will perhaps be more typical to use a string variable. Also, looping over filenames may be a bit more challenging than this. fs from SSC is a convenience command that helps put a set of filenames into a local macro; its use is not illustrated here.

postfile mypost what mean using alltest.dta

forval i = 1/10 {
    use test`i', clear
    su foo, meanonly
    post mypost (`i')  (`r(mean)')
}

Now flush results

postclose mypost

and see what we have.

u alltest

list 

     +-----------------+
     | what       mean |
     |-----------------|
  1. |    1   .5110765 |
  2. |    2   1.016858 |
  3. |    3   1.425967 |
  4. |    4   2.144528 |
  5. |    5   2.438035 |
     |-----------------|
  6. |    6   3.030457 |
  7. |    7   3.356905 |
  8. |    8   4.449655 |
  9. |    9   4.381101 |
 10. |   10   5.017308 |
     +-----------------+

I didn't use any global macros (not global variables) here; you should not need to.

查看更多
虎瘦雄心在
3楼-- · 2020-02-15 04:23

Note that if your data sets are not named sequentially like test1.dta, test2.dta, ..., test53.dta, but rather like results-alaska.dta, result_in_alabama.dta, ..., "wyoming data.dta" (note the space and hence the quotes), you would have to organize the cycle over these files somewhat differently:

local allfiles : dir . files "*.dta"
foreach f of local allfiles {
   use `"`f'"', clear
   * all other code from Maarten's or Nick's approach
}

This is a more advanced of local macros, see help extended macro functions. Note also that Stata will produce a list that will look like "results-alaska.dta" "result_in_alabama.dta" "wyoming data.dta" with quotes around file names, so when you invoke use, you will have to enclose the file name into compound quotes.

查看更多
家丑人穷心不美
4楼-- · 2020-02-15 04:30

An alternative approach is to loop over files and use collapse to "condense" these files to the relevant means, and than append these condensed files. Here is an adaptation of Nick's example:

// create the example datasets
clear

forval i = 1/10 {
    set obs 100
    gen foo = `i' * runiform()
    gen year = `i'
    save test`i', replace
    clear
}

// use collapse and append
// to create the dataset you want
use test1, clear
collapse (mean) year foo 
save means, replace
forvalues i = 2/10 {
    use test`i', clear
    collapse (mean) year foo 
    append using means
    save means, replace
}

// admire the result
list
查看更多
登录 后发表回答