R - read.table imports half of the dataset - no er

2019-07-01 18:10发布

问题:

I have a csv file with ~200 columns and ~170K rows. The data has been extensively groomed and I know that it is well-formed. When read.table completes, I see that approximately half of the rows have been imported. There are no warnings nor errors. I set options( warn = 2 ). I'm using 64-bit latest version and I increased the memory limit to 10gig. Scratching my head here...no idea how to proceed debugging this.

Edit
When I said half the file, I don't mean the first half. The last observation read is towards the end of the file....so its seemingly random.

回答1:

You may have a comment character (#) in the file (try setting the option comment.char = "" in read.table). Also, check that the quote option is set correctly.



回答2:

I've had this problem before how I approached it was to read in a set number of lines at a time and then combine after the fact.

df1 <- read.csv(..., nrows=85000) 
df2 <- read.csv(..., skip=84999, nrows=85000) 
colnames(df1) <- colnames(df2)

df <- rbind(df1,df2) 
rm(df1,df2)


回答3:

I had a similar problem when reading in a large txt file which had a "|" separator. Scattered about the txt file were some text blocks that contained a quote (") which caused the read.xxx function to stop at the prior record without throwing an error. Note that the text blocks mentioned were not encased in double quotes; rather, they just contained one double quote character here and there (") which tripped it up.

I did a global search and replace on the txt file, replacing the double quote (") with a single quote ('), solving the problem (all rows were then read in without aborting).