I have a text file that I'd like to import into R
. Problem is, the file looks like this:
x1,x2,x3,x4,x5,x6,x7,x8,x9,10,x11
1953.00 7.40000 159565. 16.6680 8883.00
47.2000 26.7000 16.8000 37.7000 29.7000
19.4000
1954.00 7.80000 162391. 17.0290 8685.00
46.5000 22.7000 18.0000 36.8000 29.7000
20.0000
and so on.
I tried > data <- read.table("clipboard", header=TRUE)
but that didn't work.
While the data is ill-formed it still can be parsed given the following assumptions:
- The header defines how many variables there are (columns in the resultant table)
- The data itself is complete - e.g. there are no missing values
- The data is of a uniform type (e.g.
numeric()
)
The following is code that parses the provided sample data as if it were read in from a text file called data.txt
:
# read in the header and split on ","
header = strsplit(readLines('data.txt', n=1), ',')[[1]]
# the length of the header determines how many variables there are
# read in the data which appears to have the pattern
# <numbers><whitespace><numbers>...
# skipping the first line since it was already parsed as the header
data = scan('data.txt', skip=1, what=numeric())
# reform the data (which is read in as a 1D numeric vector) into a 2D matrix
# with the same number of columns as there are headers (filling by rows).
# header names are assigned via the `dimnames=` argument
data = matrix(data, ncol=length(header), byrow=T, dimnames=list(NULL, header))
producing the following output:
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
[1,] 1953 7.4 159565 16.668 8883 47.2 26.7 16.8 37.7 29.7 19.4
[2,] 1954 7.8 162391 17.029 8685 46.5 22.7 18.0 36.8 29.7 20.0
Maybe you could manually edit the first line (change , to " " and insert a line break) and then try again?
Use read.csv
instead of read.table
and then add skip=1, header=FALSE
to the arguments to read.csv
.