read file into array separated by paragraph Python

2019-03-20 20:52发布

I have a text file, I want to read this text file into 3 different arrays, array1 array2 and array3. the first paragraph gets put in array1, the second paragraph gets put in array2 and so on. the 4th paragraph will then be put in array1 element2 and so forth, paragraphs are separated by a blank line. any ideas?

6条回答
姐就是有狂的资本
2楼-- · 2019-03-20 21:01
import itertools as it


def paragraphs(fileobj, separator='\n'):
    """Iterate a fileobject by paragraph"""
    ## Makes no assumptions about the encoding used in the file
    lines = []
    for line in fileobj:
        if line == separator and lines:
            yield ''.join(lines)
            lines = []
        else:
            lines.append(line)
    yield ''.join(lines)

paragraph_lists = [[], [], []]
with open('/Users/robdev/Desktop/test.txt') as f:
    paras = paragraphs(f)
    for para, group in it.izip(paras, it.cycle(paragraph_lists)):
        group.append(para)

print paragraph_lists
查看更多
Bombasti
3楼-- · 2019-03-20 21:01

Because I feel like showing off:

with open('data.txt') as f:
    f = list(f)
    a, b, c = (list(__import__('itertools').islice(f, i, None, 3)) for i in range(3))
查看更多
仙女界的扛把子
4楼-- · 2019-03-20 21:03

Using slices would also work.

par_separator = "\n\n"
paragraphs = "1\n\n2\n\n3\n\n4\n\n5\n\n6".split(par_separator)
a,b,c = paragraphs[0:len(paragraphs):3], paragraphs[1:len(paragraphs):3],\
        paragraphs[2:len(paragraphs):3] 

Within slice: [start index, end index,step]

查看更多
仙女界的扛把子
5楼-- · 2019-03-20 21:20

This is the basic code I would try:

f = open('data.txt', 'r')

data = f.read()
array1 = []
array2 = []
array3 = []
splat = data.split("\n\n")
for number, paragraph in enumerate(splat, 1):
    if number % 3 == 1:
        array1 += [paragraph]
    elif number % 3 == 2:
        array2 += [paragraph]
    elif number % 3 == 0:
        array3 += [paragraph]

This should be enough to get you started. If the paragraphs in the file are split by two new lines then "\n\n" should do the trick for splitting them.

查看更多
在下西门庆
6楼-- · 2019-03-20 21:26

More elegant way to bypass slices:

def grouper(n, iterable, fillvalue=None):
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

for p in grouper(5,[sent.strip() for sent in text.split('\n') if sent !='']):
    print p

Just make sure you deal with None in final text

查看更多
疯言疯语
7楼-- · 2019-03-20 21:27

I know this question was asked long before but just putting my inputs so that it will be useful to somebody else at some point of time. I got to know much easier way to split the input file into paragraphs based on the Paragraph Separator(it can be a \n or a blank space or anything else) and the code snippet for your question is given below :

with open("input.txt", "r") as input:
    input_ = input.read().split("\n\n")   #\n\n denotes there is a blank line in between paragraphs.

And after executing this command, if you try to print input_[0] it will show the first paragraph, input_[1] will show the second paragraph and so on. So it is putting all the paragraphs present in the input file into an List with each List element contains a paragraph from the input file.

查看更多
登录 后发表回答