What is the best way to read a file into R when the header has two necessary lines for the header?
This happens to me all the time, as people often use one line for the column name and then include another line underneath it for the unit of measurement. I don't want to skip anything. I want the names and the units to carry through.
Here is what a typical file with two headers might look like:
trt biomass yield
crop Mg/ha bu/ac
C2 17.76 205.92
C2 17.96 207.86
CC 17.72 197.22
CC 18.42 205.20
CCW 18.15 200.51
CCW 17.45 190.59
P 3.09 0.00
P 3.34 0.00
S2 5.13 49.68
S2 5.36 49.72
Almost the same method to the other answers, just shortening to 2 statements:
Result:
A slightly different explained step by step approach:
Read only the first two lines of the files as data (without headers):
Create the headers names with the two (or more) first rows,
sappy
allows to make operations over the columns (in this case paste) - read more about sapply here :Read the data of the files (skipping the first 2 rows):
And assign the headers of step two to the data:
The advantage is that you would have clear control of the the parameters of read.table (such as
sep
for commas, andstringAsFactors
- for both the headers and the data)I would do two steps, assuming we know that the first row contains the labels, and there are always two headers.
Then add the character vector
header
on as thenames
component:For your data this would be
If you want the units, as per @DWin's answer, then do a second
scan()
on line 2Use
readLines
with 2 for the limit, parse it,paste0
them together, then read in withread.table
withskip =2
andheader=FALSE
(the default). Finish the process off with assignment of the column names:You would probably use a file argument but using the
text
argument to the read-functions makes this more self-contained:Might be better to use paste with an underscore-sep: