I was using the skip option in read.csv to skip a few lines before reading into my data frame from a csv file. However, when I do a names(dataframe) upon doing this, I lose my column names and get some random strings as column names. Why does this happen?
> mydf = read.csv("mycsvfile.csv",skip=100)
> names(mydf)
[1] "X2297256" "X3"
Without the skip option, it works fine
> mydf = read.csv("mycsvfile.csv")
> names(mydf)
[1] "col1" "col2"
If you skip lines in a file, you skip the complete line, so if your header is in the first line and you skip 100 lines, the header line will be skipped. If you want to skip part of the the file and still keep headers, you'll need to read them separately
headers <- names(read.csv("mycsvfile.csv",nrows=1))
mydf <- read.csv("mycsvfile.csv", header=F, col.names=headers, skip=100)
It is not necessary to read in the headers separately. You can do this in one line by using negative indexing on the dataframe, where a negative index means "keep all lines except the negative index (range)".
So if you want to keep the headers and then skip the first N lines you just need to do this:
mydf<-read.csv("mycsvfile.csv",header=T)[-1:-N,]