I'm working with a data frame similar to the extract below:
df <- data.frame(A=c("Some messy string to be used",222,0),
B=c("Very important ? indicator from 2001", 888, 44),
C=c("001 This variable / makes no sense", 888, 44),
D=c("Geography", 1, 2))
I would like to use values in first row as column names, I'm using the code below:
names(df) <- make.names(df[1,])
Unfortunately, the syntax generates names in the format Xn, as illustrated below:
> names(df)
[1] "X3" "X3" "X1" "X3"
I understand that the utilised strings are to messy for make.names
to be meaningfully converted. How can I force R to use those messy string in a more efficient manner? As a rule of thumb I would like to:
- Keep figures (as they correspond to time)
- Keep at least few first words from the text
- Ensure that the names are unique
- The whole solution have to be fairly generic as there is a lot of rubbish in the first row (usually empty spaces or special characters).