Read files with extension .data into R

2019-03-31 05:23发布

I need to read a data file into R for my assignment. You can download it from the following site.

http://archive.ics.uci.edu/ml/datasets/Acute+Inflammations

The data file ends with an extension .data which I never see before. I tried read.table and alike but could not read it into R properly. Can anyone help me with this, please?

标签: r dataset
3条回答
我只想做你的唯一
2楼-- · 2019-03-31 06:01

You have a UTF-16LE file, a.k.a Unicode on Windows (in case you're on that os). Try this

f <-file("http://archive.ics.uci.edu/ml/machine-learning-databases/acute/diagnosis.data", open="r" ,encoding="UTF-16LE")
data <- read.table(f, dec=",", header=F)

Though trying what @Gavin Simpson said might help, as you can add your headings and save the file

查看更多
欢心
3楼-- · 2019-03-31 06:06

It's a UTF-16 little endian file with a byte order mark at the beginning. read.table will fail unless you specify the correct encoding. This works for me on MacOS. Decimals are indicated by a comma.

read.table("diagnosis.data", fileEncoding="UTF-16", dec=",")

      V1  V2  V3  V4  V5  V6  V7  V8
1   35.5  no yes  no  no  no  no  no
2   35.9  no  no yes yes yes yes  no
3   35.9  no yes  no  no  no  no  no
查看更多
时光不老,我们不散
4楼-- · 2019-03-31 06:17

From your link:

The data is in an ASCII file. Attributes are separated by TAB.

Thus you need to use read.table() with sep = "\t"

-- Attribute lines: For example, '35,9 no no yes yes yes yes no' Where: '35,9' Temperature of patient 'no' Occurrence of nausea 'no' Lumbar pain 'yes' Urine pushing (continuous need for urination) 'yes' Micturition pains 'yes' Burning of urethra, itch, swelling of urethra outlet 'yes' decision: Inflammation of urinary bladder 'no' decision: Nephritis of renal pelvis origin

Also looks like it uses a comma for the decimal, so also specify dec = "," inside read.table().

It looks like you'll need to put in the column headings manually, though your link defines them.

Make sure you see @Gavin Simpson's comment below to clean up other undocumented "features" of this dataset.

查看更多
登录 后发表回答