convert .data file to .csv

2019-07-21 20:37发布

I have found the following data set named ecoli.data and available in:

https://archive.ics.uci.edu/ml/machine-learning-databases/ecoli/

I would like to open it in R for making a classification task, but I would prefer to convert this document into a csv file. When I open it in word I notice that is not tab delimited, because there are like tree spaces between each row; so bottomline question is how to convert this file into csv using Excel or maybe Python.

标签: python r excel csv
4条回答
Rolldiameter
2楼-- · 2019-07-21 21:15

Here are two ways to actually do that in R (that work):

library(readr)

url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/ecoli/ecoli.data"

with base R

df <- read.table(url)
dplyr::glimpse(df)

## Observations: 336
## Variables:
## $ V1 (fctr) AAT_ECOLI, ACEA_ECOLI, ACEK_ECOLI, ACKA_ECOLI, ADI_ECOLI, ...
## $ V2 (dbl) 0.49, 0.07, 0.56, 0.59, 0.23, 0.67, 0.29, 0.21, 0.20, 0.42,...
## $ V3 (dbl) 0.29, 0.40, 0.40, 0.49, 0.32, 0.39, 0.28, 0.34, 0.44, 0.40,...
## $ V4 (dbl) 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48,...
## $ V5 (dbl) 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,...
## $ V6 (dbl) 0.56, 0.54, 0.49, 0.52, 0.55, 0.36, 0.44, 0.51, 0.46, 0.56,...
## $ V7 (dbl) 0.24, 0.35, 0.37, 0.45, 0.25, 0.38, 0.23, 0.28, 0.51, 0.18,...
## $ V8 (dbl) 0.35, 0.44, 0.46, 0.36, 0.35, 0.46, 0.34, 0.39, 0.57, 0.30,...
## $ V9 (fctr) cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp...

write.csv(df, "ecoli.csv", row.names=FALSE)

with readr functions

df <- read_table(url, col_names=FALSE)
dplyr::glimpse(df)

## Observations: 336
## Variables:
## $ X1 (chr) "AAT_ECOLI", "ACEA_ECOLI", "ACEK_ECOLI", "ACKA_ECOLI", "ADI...
## $ X2 (dbl) 0.49, 0.07, 0.56, 0.59, 0.23, 0.67, 0.29, 0.21, 0.20, 0.42,...
## $ X3 (dbl) 0.29, 0.40, 0.40, 0.49, 0.32, 0.39, 0.28, 0.34, 0.44, 0.40,...
## $ X4 (dbl) 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48,...
## $ X5 (dbl) 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,...
## $ X6 (dbl) 0.56, 0.54, 0.49, 0.52, 0.55, 0.36, 0.44, 0.51, 0.46, 0.56,...
## $ X7 (dbl) 0.24, 0.35, 0.37, 0.45, 0.25, 0.38, 0.23, 0.28, 0.51, 0.18,...
## $ X8 (dbl) 0.35, 0.44, 0.46, 0.36, 0.35, 0.46, 0.34, 0.39, 0.57, 0.30,...
## $ X9 (chr) "cp", "cp", "cp", "cp", "cp", "cp", "cp", "cp", "cp", "cp",...

write_csv(df, "ecoli.csv")
查看更多
甜甜的少女心
3楼-- · 2019-07-21 21:16

Using Python 2.7:

import csv

with open('ecoli.data.txt') as input_file:
   lines = input_file.readlines()
   newLines = []
   for line in lines:
      newLine = line.strip().split()
      newLines.append( newLine )

with open('output.csv', 'wb') as test_file:
   file_writer = csv.writer(test_file)
   file_writer.writerows( newLines )
查看更多
叛逆
4楼-- · 2019-07-21 21:19

Rename the file to ecoli.txt then open it in Excel. This way you will be using the "Text Import Wizard" of Microsoft Excel that enables you to chose options like "Fixed width". Just click on "next" a few times and "finish" and you will have the data in the Excel grid. Now save it again as CSV.

查看更多
We Are One
5楼-- · 2019-07-21 21:25

Use pandas.read_table('https://archive.ics.uci.edu/ml/machine-learning-databases/ecoli/', delim_whitespace=True)

查看更多
登录 后发表回答