NB: To the best of my knowledge this question is not a duplicate! All the questios/answers I found are either how to eliminate points from data that are already in R or how to change the decimal point to a comma when loading it.
I have a csv with numbers like: 4.123,98
. The problem is that because of the .
the output becomes a character string matrix when loading with read.table
, read.csv
or read.csv2
. Changing dec
to ,
doesn't help.
My question
What is the most elegant way to load this csv so that the numbers become e.g. 4123.98
as numeric?
#some sample data
write.csv(data.frame(a=c("1.234,56","1.234,56"),
b=c("1.234,56","1.234,56")),
"test.csv",row.names=FALSE,quote=TRUE)
#define your own numeric class
setClass('myNum')
#define conversion
setAs("character","myNum", function(from) as.numeric(gsub(",","\\.",gsub("\\.","",from))))
#read data with custom colClasses
read_data=read.csv("test.csv",stringsAsFactors=FALSE,colClasses=c("myNum","myNum"))
#let's try whether this is really a numeric
read_data[1,1]*2
#[1] 2469.12
Rather than try to fix it all at loading time, I would load the data into R as a string, then process it to numeric.
So after loading, it's a column of strings like "4.123,98"
Then do something like:
number.string <- gsub("\\.", "", number.string)
number.string <- gsub(",", "\\.", number.string)
number <- as.numeric(number.string)