I have a text file with Byte order mark (U+FEFF) at the beginning. I am trying to read the file in R. Is it possible to avoid the Byte order mark?
The function fread
(from the data.table
package) reads the file, but adds ļ»æ
at the beginning of the first variable name:
> names(frame_pers)[1]
[1] "ļ»æreg_date"
The same is with read.csv
function.
Currently I have made a function which removes the BOM from the first column name, but I believe there should be a way how to automatically strip the BOM.
remove.BOM <- function(x) setnames(x, 1, substring(names(x)[1], 4))
> names(frame_pers)[1]
[1] "ļ»æreg_date"
> remove.BOM(frame_pers)
> names(frame_pers)[1]
[1] "reg_date"
I am using the native encoding for the R session:
> options("encoding" = "")
> options("encoding")
$encoding
[1] ""
This was handled between versions 1.9.6 and 1.9.8 with this commit; update your
data.table
installation to fix this.Once done, you can just use
fread
:Have you tried
read.csv(..., fileEncoding = "UTF-8-BOM")
?.?file
says: