Thanks to joran for helping me to group data in my previous question where I wanted to make a data frame in R smaller so that I can do time-series analysis on the data.
Now I would like to actually further extract data from the dataframe. The dataframe is made up of 6 columns. Columns 1 to 5 each have discrete names/values, such as a district, gender, year, month and age group. The sixth column is the number of death counts for that specific combination. An extract looks like this:
District Gender Year Month AgeGroup TotalDeaths
Northern Male 2006 11 01-4 0
Northern Male 2006 11 05-14 1
Northern Male 2006 11 15+ 83
Northern Male 2006 12 0 3
Northern Male 2006 12 01-4 0
Northern Male 2006 12 05-14 0
Northern Male 2006 12 15+ 106
Southern Female 2003 1 0 6
Southern Female 2003 1 01-4 0
Southern Female 2003 1 05-14 3
Southern Female 2003 1 15+ 136
Southern Female 2003 2 0 6
Southern Female 2003 2 01-4 0
Southern Female 2003 2 05-14 1
Southern Female 2003 2 15+ 111
Southern Female 2003 3 0 2
Southern Female 2003 3 01-4 0
Southern Female 2003 3 05-14 1
Southern Female 2003 3 15+ 141
Southern Female 2003 4 0 4
I am new to time-series, and I think I will need to do this to analyse the data: I will need to extract smaller 'time-series' data objects that are unique and longitudinal data. For example from this above dataframe, I want to extract smaller data objects like this for each District, Gender and AgeGroup:
District Gender Year Month AgeGroup TotalDeaths
Northern Male 2003 1 01-4 0
Northern Male 2003 2 01-4 1
Northern Male 2003 3 01-4 0
Northern Male 2003 4 01-4 3
Northern Male 2003 5 01-4 4
Northern Male 2003 6 01-4 6
Northern Male 2003 7 01-4 5
Northern Male 2003 8 01-4 0
Northern Male 2003 9 01-4 1
Northern Male 2003 10 01-4 2
Northern Male 2003 11 01-4 0
Northern Male 2003 12 01-4 1
Northern Male 2004 1 01-4 1
Northern Male 2004 2 01-4 0
Going to
Northern Male 2006 11 01-4 0
Northern Male 2006 12 01-4 0
I tried something in Excel, creating pivot tables with this data, and then tried to extract the string of information - but failed. After that I discovered reshape
in R, but I either don't know the codes or perhaps should not use reshape
to do this.
I am not even certain if this is the correct/ way to analyse this cross-sectional time-series data, ie. if there is actually another format required to analyse this data with functions such as read.ts()
, ts()
and arima()
.
My eventual aim is to use this data and the amelia2
package with its functions to impute for missing TotalDeaths
for certain months in 2007 and 2008, where the data is of course missing.
Any help, how to do this and perhaps suggestions on how to tackle this problem would be gratefully appreciated.
For the narrow question of how to best extract:
subset
also has a select argument to narrow down the columns. I suspect a search on the term "extract" you were using would have only pulled up hits for the ?Extract page which surprisingly has no link tosubset
. (I trimmed a trailing space from an earlier version of the AgeGroup specification.)