I'm using R to analyse data about antibiotic use from a number of hospitals.
I've imported this data into a frame, according to the tidy data principles.
>head(data)
date antibiotic usage hospital
1 2006-01-01 amikacin 0.000000 hospital1
2 2006-02-01 amikacin 0.000000 hospital1
3 2006-03-01 amikacin 0.000000 hospital1
4 2006-04-01 amikacin 0.000000 hospital1
5 2006-05-01 amikacin 0.937119 hospital1
6 2006-06-01 amikacin 1.002961 hospital1
(the data set is monthly data x 5 hospitals x 40 antibiotics)
The first thing I would like to do is aggregate the antibiotics into classes.
> head(distinct(select(data, antibiotic)))
antibiotic
1 amikacin
2 amoxicillin-clavulanate
3 amoxycillin
4 ampicillin
5 azithromycin
6 benzylpenicillin
7 cefalotin
8 cefazolin
> penicillins <- c("amoxicillin-clavulanate", "amoxycillin", "ampicillin", "benzylpenicillin")
> ceph1 <- c("cefalotin", "cefazolin")
What I would like to do is then subset the data based on these antibiotic class vectors:
filter(data, antibiotic =(any one of the values in the vector "penicillins")
Thanks to thelatemail for pointing out the way to do this is:
d <- filter(data, antibiotic %in% penicillins)
What I would like the data to do is to be analysed in a number of ways:
The key analysis (and ggplot output) is:
x = date
y = usage of antibiotic(s) stratified by (drug | class), filtered by hospital
What I'm not clear on now is how to aggregate the data for this sort of thing.
Example:
I want to analyse the use of class "ceph1" across all the hospitals in the district, resulting in (apologies - i know this is not proper code)
x y
Jan-2006 for all in hospitals(usage of cephazolin + usage of cephalotin)
Feb-2006 for all in hospitals(usage of cephazolin + usage of cephalotin)
etc
And, in the long-run, to be able to pass arguments to a function which will let me select which hospitals and which antibiotic or class of antibiotics.
Thanks again - I know this is an order of magnitude more complicated than the original question!