How to extract longitudinal time-series data from

2019-08-20 08:57发布

问题:

Thanks to joran for helping me to group data in my previous question where I wanted to make a data frame in R smaller so that I can do time-series analysis on the data.

Now I would like to actually further extract data from the dataframe. The dataframe is made up of 6 columns. Columns 1 to 5 each have discrete names/values, such as a district, gender, year, month and age group. The sixth column is the number of death counts for that specific combination. An extract looks like this:

             District  Gender Year Month    AgeGroup TotalDeaths
             Northern    Male 2006    11        01-4           0
             Northern    Male 2006    11       05-14           1
             Northern    Male 2006    11         15+          83
             Northern    Male 2006    12           0           3
             Northern    Male 2006    12        01-4           0
             Northern    Male 2006    12       05-14           0
             Northern    Male 2006    12         15+         106
             Southern  Female 2003     1           0           6
             Southern  Female 2003     1        01-4           0
             Southern  Female 2003     1       05-14           3
             Southern  Female 2003     1         15+         136
             Southern  Female 2003     2           0           6
             Southern  Female 2003     2        01-4           0
             Southern  Female 2003     2       05-14           1
             Southern  Female 2003     2         15+         111
             Southern  Female 2003     3           0           2
             Southern  Female 2003     3        01-4           0
             Southern  Female 2003     3       05-14           1
             Southern  Female 2003     3         15+         141
             Southern  Female 2003     4           0           4

I am new to time-series, and I think I will need to do this to analyse the data: I will need to extract smaller 'time-series' data objects that are unique and longitudinal data. For example from this above dataframe, I want to extract smaller data objects like this for each District, Gender and AgeGroup:

             District  Gender Year Month    AgeGroup TotalDeaths
             Northern    Male 2003     1        01-4           0
             Northern    Male 2003     2        01-4           1
             Northern    Male 2003     3        01-4           0
             Northern    Male 2003     4        01-4           3
             Northern    Male 2003     5        01-4           4
             Northern    Male 2003     6        01-4           6
             Northern    Male 2003     7        01-4           5
             Northern    Male 2003     8        01-4           0
             Northern    Male 2003     9        01-4           1
             Northern    Male 2003    10        01-4           2
             Northern    Male 2003    11        01-4           0
             Northern    Male 2003    12        01-4           1
             Northern    Male 2004     1        01-4           1
             Northern    Male 2004     2        01-4           0

Going to

             Northern    Male 2006    11        01-4           0
             Northern    Male 2006    12        01-4           0

I tried something in Excel, creating pivot tables with this data, and then tried to extract the string of information - but failed. After that I discovered reshapein R, but I either don't know the codes or perhaps should not use reshape to do this.

I am not even certain if this is the correct/ way to analyse this cross-sectional time-series data, ie. if there is actually another format required to analyse this data with functions such as read.ts(), ts() and arima().

My eventual aim is to use this data and the amelia2 package with its functions to impute for missing TotalDeaths for certain months in 2007 and 2008, where the data is of course missing.

Any help, how to do this and perhaps suggestions on how to tackle this problem would be gratefully appreciated.

回答1:

For the narrow question of how to best extract:

subset(dfrm, subset=(District=="Northern" &  Gender=="Male" &  AgeGroup=="01-4"))

subset also has a select argument to narrow down the columns. I suspect a search on the term "extract" you were using would have only pulled up hits for the ?Extract page which surprisingly has no link to subset. (I trimmed a trailing space from an earlier version of the AgeGroup specification.)