fread from data.table package when column names in

2019-02-17 01:33发布

I have a csv file where column names include spaces and special characters.

fread imports them with quotes - but how can I change this behaviour? One reason is that I have column names starting with a space and I don't know how to handle them.

Any pointers would be helpful.

Edit: An example.

> packageVersion("data.table")
[1] ‘1.8.8’

p2p <- fread("p2p.csv", header = TRUE, stringsAsFactors=FALSE)

> head(p2p[,list(Principal remaining)])
Error: unexpected symbol in "head(p2p[,list(Principal remaining"

> head(p2p[,list("Principal remaining")])
                    V1
1: Principal remaining

> head(p2p[,list(c("Principal remaining"))])
                    V1
1: Principal remaining

What I was expecting/want is of course, what a column name without spaces yields:

> head(p2p[,list(Principal)])
   Principal
1:      1000
2:      1000
3:      1000
4:      2000
5:      1000
6:      4130

3条回答
forever°为你锁心
2楼-- · 2019-02-17 01:52

A little bit modified BondedDust version, because setnames function is not used with <- sign:

setnames(DT, make.names(colnames(DT))
查看更多
姐就是有狂的资本
3楼-- · 2019-02-17 01:55

It should be rather difficult to get a leading space in a column name. Should not happen by "casual coding". On the other hand I don't see very much error checking in the fread code, so maybe until this undesirable behavior is fixed, (or the feature request refused), you can do something like this:

setnames(DT, make.names(colnames(DT))) 

If on the other hand you are bothered by the fact that colnames(DT) will display the column names with quotes then just "get over it." That's how the interactive console will display any character value.

If you have a data item in a character column that looks like " ttt" in the original, then it's going to have leading spaces when imported, and you need to process it with colnames(dfrm) <- sub("^\\s+", "", colnames(dfrm)) or one of the several trim functions in various packages (such as 'gdata')

查看更多
Lonely孤独者°
4楼-- · 2019-02-17 01:59

You can use argument check.names=T in fread function of data.table

p2p <- fread("p2p.csv", header = TRUE, stringsAsFactors=FALSE, check.names=T)

It uses make.names function in background

default is FALSE. If TRUE then the names of the variables in the data.table 
are checked to ensure that they are syntactically valid variable names. If 
necessary they are adjusted (by make.names) so that they are, and also to 
ensure that there are no duplicates.
查看更多
登录 后发表回答