I have 53 Stata .dta
files each of them is 150 - 200 Mb and contain identical set of variables, but for different years. It is not useful to combine or merge them due to their size .
I need to retrieve some averaged values (percentages etc.) Therefore, I want to create a new Stata file New.dta
and write a .do
file that would run on that new Stata file in the following way: it should open each of those 53 Stata files, make certain calulations, and store the results in the new Stata file, New.dta.
I am not sure how i can keep two Stata file open simultaneuosly, and how can i store the calculated values?
When I open a second .dta file, how can i make the first one still be open? How can i store the calculated values in the global variable?
What springs to mind here is the use of
postfile
.Here is a simple example. First, I set up an example of several datasets. You already have this.
Now I set up
postfile
. I need to set up a handle, what variables will be used, and what file will be used. Although I am using a numeric variable to hold file identifiers, it will perhaps be more typical to use a string variable. Also, looping over filenames may be a bit more challenging than this.fs
from SSC is a convenience command that helps put a set of filenames into a local macro; its use is not illustrated here.Now flush results
and see what we have.
I didn't use any global macros (not global variables) here; you should not need to.
Note that if your data sets are not named sequentially like
test1.dta
,test2.dta
, ...,test53.dta
, but rather likeresults-alaska.dta
,result_in_alabama.dta
, ...,"wyoming data.dta"
(note the space and hence the quotes), you would have to organize the cycle over these files somewhat differently:This is a more advanced of local macros, see help extended macro functions. Note also that Stata will produce a list that will look like
"results-alaska.dta" "result_in_alabama.dta" "wyoming data.dta"
with quotes around file names, so when you invokeuse
, you will have to enclose the file name into compound quotes.An alternative approach is to loop over files and use
collapse
to "condense" these files to the relevant means, and thanappend
these condensed files. Here is an adaptation of Nick's example: