解析文本文件,并在字典分离数据(Parsing Text file and segregating

2019-10-29 08:30发布

我有一种复杂的问题,在这里分析的文本文件。

我需要的:

  1. 通读文本文件。

  2. 如果某行相匹配的特定条件,创建一个名为键(条件1)

  3. 复制下面的列表中的行。 该列表需要与密钥关联(条件1)

  4. 当再次遇到的条件下,一个新的密钥,并复制以下行并重复步骤3,直到文件的结尾

问题:我无法追加在列表中的新项目对于给定的关键

示例文本输入文件:

A1 letters characters jgjgjg
A2 letters numbers fgdhdhd
D1 letters numbers haksjshs
condition1, dhdjfjf
K2 letters characters jgjgjg
J1 alphas numbers fgdhdhd
L1 letters numbers haksjshs
condition2, dhdjfjf
J1 alphas numbers fgdhdhd
D1 letters numbers haksjshs
J1 alphas numbers fgdhdhd
D1 letters numbers haksjshs

预计词典:

dictone = {'condition1':['K2 letters characters jgjgjg','J1 alphas numbers fgdhdhd','L1 letters numbers haksjshs'], 'condition2':['J1 alphas numbers fgdhdhd','D1 letters numbers haksjshs','J1 alphas numbers fgdhdhd','D1 letters numbers haksjshs'..........}

以下是我迄今所做..

flagInitial = False # flag to start copy after encountering condition

    with open(inputFilePath, "r") as tfile:

        for item in tfile:

            gcmatch = gcpattern.match(item)

            if gcmatch:

                extr = re.split(' ', item)
                laynum = extr[2]

                newKey = item[2:7] + laynum[:-1]
                flagInitial = True
                gcdict[newKey] = item
                continue

            if flagInitial == True:
                gcdict[newKey].append(item)  # stuck here 
                # print(gcdict[newKey])
                # print(newKey)

我失去了语法什么的?

Answer 1:

尝试这个:

In [46]: from collections import defaultdict

In [47]: d = defaultdict(list)

In [48]: cond = None
    ...: for i in mystring.splitlines():
    ...:     if 'condition' in i.split()[0]:
    ...:         cond = i.split()[0][:-1]        ...:         
    ...:     elif cond:
    ...:         d[cond].append(i)


In [49]: d
Out[49]: 
defaultdict(list,
            {'condition1': ['K2 letters characters jgjgjg',
              'J1 alphas numbers fgdhdhd',
              'L1 letters numbers haksjshs'],
             'condition2': ['J1 alphas numbers fgdhdhd',
              'D1 letters numbers haksjshs',
              'J1 alphas numbers fgdhdhd',
              'D1 letters numbers haksjshs']})


Answer 2:

随着re.search功能和collection.defaultdict对象:

import re
import collections

with open('input.txt', 'rt') as f:
    pat = re.compile(r'^condition\d+')
    d = collections.defaultdict(list)
    curr_key = None

    for line in f:               
        m = pat.search(line)
        if m:
            curr_key = m.group()
            continue
        if curr_key:
            d[curr_key].append(line.strip())         

print(dict(d))        

输出:

{'condition1': ['K2 letters characters jgjgjg', 'J1 alphas numbers fgdhdhd', 'L1 letters numbers haksjshs'], 'condition2': ['J1 alphas numbers fgdhdhd', 'D1 letters numbers haksjshs', 'J1 alphas numbers fgdhdhd', 'D1 letters numbers haksjshs']}


文章来源: Parsing Text file and segregating the data in a Dictionary