How to read a dataset from a txt file in Python?

I have a dataset in this format:

example data

I need to import the data and work with it.

The main problem is that the first and the fourth columns are strings while the second and third columns are floats and ints, respectively.

I'd like to put the data in a matrix or at least obtain a list of each column's data.

I tried to read the whole dataset as a string but it's a mess:

f = open ( 'input.txt' , 'r')
l = [ map(str,line.split('\t')) for line in f ]

What could be a good solution?

标签： python string file matrix dataset

5条回答

兄弟一词,经得起流年.

2楼-- · 2019-02-18 23:40

split and transpose the list:

 with open ( 'in.txt' , 'r') as f: # use with to open your files, it close them automatically
    l = [x.split() for x in f]
    rows = [list(x) for x in zip(*l)]
    rows[1],rows[2] = map(float,rows[1]),map(int,rows[2])
In [16]: rows
Out[16]: 
[['bbbbffdd', 'bbbWWWff', 'ajkfbdafa'],
 [434343.0, 43545343.0, 2345345.0],
 [228, 289, 2312],
 ['D', 'E', 'F']]

0人赞添加讨论(0) 举报

放我归山

3楼-- · 2019-02-18 23:55

You seem to have CSV data (with tabs as the delimiter) so why not use the csv module?

import csv

with open('data.csv') as f:
    reader = csv.reader(f, delimiter='\t')
    data = [(col1, float(col2), int(col3), col4)
                for col1, col2, col3, col4 in reader]

data is a list of tuples containing the converted data (column 2 -> float, column 3 -> int). If data.csv contains (with tabs, not spaces):

thing1  5.005069    284 D
thing2  5.005049    142 D
thing3  5.005066    248 D
thing4  5.005037    124 D

data would contain :

[('thing1', 5.005069, 284, 'D'),
 ('thing2', 5.005049, 142, 'D'),
 ('thing3', 5.005066, 248, 'D'),
 ('thing4', 5.005037, 124, 'D')]

0人赞添加讨论(0) 举报

孤傲高冷的网名

4楼-- · 2019-02-18 23:55

You can use pandas. They are great for reading csv files, tab delimited files etc. Pandas will almost all the time read the data type correctly and put them in an numpy array when accessed using rows/columns as demonstrated.

I used this tab delimited 'test.txt' file:

    bbbbffdd    434343  228 D 
    bbbWWWff    43545343    289 E
    ajkfbdafa   2345345 2312    F

Here is the pandas code. Your file will be read in a nice dataframe using one line in python. You can change the 'sep' value to anything else to suit your file.

    import pandas as pd
    X = pd.read_csv('test.txt', sep="\t", header=None)

Then try:

    print X
            0         1     2   3
    0   bbbbffdd    434343   228  D 
    1   bbbWWWff  43545343   289   E
    2  ajkfbdafa   2345345  2312   F

    print X[0]
    0     bbbbffdd
    1     bbbWWWff
    2    ajkfbdafa

    print X[2]
    0     228
    1     289
    2    2312

    print X[1][1:]
    1    43545343
    2     2345345

You can add column names as:

    X.columns = ['random_letters', 'number', 'simple_number', 'letter']

And then get the columns as:

    X['number'].values
    array([  434343, 43545343,  2345345])

0人赞添加讨论(0) 举报

疯言疯语

5楼-- · 2019-02-18 23:55

Here's a solution to read in the data and convert those second and third columns to numeric types:

f = open('input.txt', 'r')

rows = []
for line in f:
    # Split on any whitespace (including tab characters)
    row = line.split()
    # Convert strings to numeric values:
    row[1] = float(row[1])
    row[2] = int(row[2])
    # Append to our list of lists:
    rows.append(row)

print rows

With the following input.txt:

string1 5.005069    284 D
string2 5.005049    142 D
string3 5.005066    284 D
string4 5.005037    124 D

It produces the following output:

[['string1', 5.005069, 284, 'D'], 
 ['string2', 5.005049, 142, 'D'], 
 ['string3', 5.005066, 284, 'D'], 
 ['string4', 5.005037, 124, 'D']]

0人赞添加讨论(0) 举报

你好瞎i

6楼-- · 2019-02-19 00:06

Use numpy.loadtxt("data.txt") to read data as a list of rows

[[row1],[row2],[row3]...]

each row has elements of each column

[row1] = [col1, col2, col3, ...]

Use dtype = string to read each entry as string

You can convert corresponding values to integer, float, etc. with a for loop.

Reference: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.loadtxt.html

0人赞添加讨论(0) 举报

How to read a dataset from a txt file in Python?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间