How to read file when the words are separated by “

2019-08-19 12:19发布

问题:

In Python, I have a file which the words are separated by |, for example: city|state|zipcode. My file reader is unable to separate the words. Also, I want my file reader to start on line 2 rather than line 1. How do I get my file reader to separate the words?

import os
import sys

def file_reader(path, num_fields, seperator = ',', header = False):
    try:
        fp = open(path, "r", encoding="utf-8")
    except FileNotFoundError:
        raise FileNotFoundError("Unable to open file.")
    else:
        with fp:
            for n, line in enumerate(fp, 1):
                fields = line.rstrip('/n').split(seperator)
                if len(fields) != num_fields:
                    raise ValueError("Unable to read file.")
                elif n == 1 and header:
                    continue
                else:
                    yield tuple([f.strip() for f in fields])

回答1:

If you use [1:-1] (I think) you can select a sub array which starts after the first value of the array, which in the case of a file, should mean you get every line except the first.



回答2:

if you need to read from second line you can change your code from: for n, line in enumerate(fp, 1) to for n, line in enumerate(fp[1:], 1)



回答3:

If you want an ultra shoddy ++ option to skip enumerating the first value: make a boolean value initialised to true, and then add an if statement at the start of your for loop which tests if this boolean value is true. Inside this if statement, set the value to false, and then pass a continue

Something like:

b = True
for k, v in enumerator:
  if b:
    b = False
    continue
  # Some code


回答4:

In order to achieve what you request, the function is fine, and it is important to call it with the correct arguments, and make them different from default.

From the code, the default behavior is to use , as a separator, and to not skip the first line of the file. In order to actually split with | and skip the first line (i.e. a header), then we will set seperator='|' and header = True when we call it.

# Function is fine, leave as-is
#
def file_reader(path, num_fields, seperator = ',', header = False):
    try:
        fp = open(path, "r", encoding="utf-8")
    except FileNotFoundError:
        raise FileNotFoundError("Unable to open file.")
    else:
        with fp:
            for n, line in enumerate(fp, 1):
                fields = line.rstrip('/n').split(seperator)
                if len(fields) != num_fields:
                    raise ValueError("Unable to read file.")
                elif n == 1 and header:
                    continue
                else:
                    yield tuple([f.strip() for f in fields])

# Example file afile.txt contains these lines:
# alfa|beta|gamma|delta
# 1|2|3|4
# a|b|c|d

# here we call the function:

filename = 'afile.txt'
for x in file_reader(filename, 4, '|', True):  #note the separator and header
    print(x)


回答5:

We will divide the work into 3 steps reading the file, store each line of the file in a list, separate the list

Reading File in python you can easily read a file using 'open' command as follows:

fp=open("file.txt",'r')

Reading each line separately to read the file as lines you can use 'readlines' command as follows:

lines=fp.readline():

this will return the content of the file as a list, in which each record represent a line. You can also read a specific line by passing the number of the line fp.readline(5)

--> For more info check reading files in python Separating the Content To separate the Strings by '|' use the 'split' method:

for item in lines:
    res=item.split('|')
    #do what you want with res


回答6:

If you don't mind to use existing framework, you can use pandas. You can skip first row using skiprows=1 and change the separator using sep='|'

# load pandas
import pandas as pd

# read file as pandas dataframe
dataframe = pd.read_csv(file,skiprows=1,sep='|')
print(dataframe)

To install pandas

pip install pandas

Pandas documentation for read_csv

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Other option is to use csv reader to read your psv file

import csv

with open('file.psv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter='|')
    next(csv_reader, None)  # read once to skip the header once

    for row in csv_reader:
            print(row)