Matching three variables from textfile to csv and

2020-08-01 15:35发布

I'm looking for some help cycling through each group in my textfile and matching three variables with my csv and on a successfull match it will write a few new variables to the csv file:

In the textfile line 1 matches to csv element 1 In the textfile line 2 matches to csv element 0

Each Student will be split into three parts: 3 Tommy 144512/23332 and part 1 and part 3 will be written to element 12 and 13 respectivley. Part 2 will be used for the third match, matching to csv elelment 8 this is to find out which row to write to.

"data" will be written to element 14 (column 15) "misc3" will be written to element 15 (column 16) "bla3" will be written to element 16 (column 17)

Commented Textfile:

     Textfile Item 1 (Will loop/cycle/run 4 times, because there are 4 students)
           |
           v

MData (N/A)                <-- Match Line 1 (matches to csv element 1)
DMATCH1                    <-- Match Line 2 (matches to csv element 0)
3 Tommy 144512/23332       <-- Match Line 3 (matches to csv element 8) (Loop 1)                 
1 Jim 90000/222311     <-- Match Line 3 (matches to csv element 8) (Loop 2)
1 Elz M 90000/222311       <-- Match Line 3 (matches to csv element 8) (Loop 3)
1 Ben 90000/222311         <-- Match Line 3 (matches to csv element 8) (Loop 4)
Data $50.90                <-- If "Data" Exists then filewrite to csv element 14 (Loop 1)   
misc2 $10.40               <-- If "misc2" Exists then filewrite to csv element 15 (Loop 1)
bla3 $20.20               <-- If "bla3" Exists then filewrite to csv element 16 (Loop 1)


     Textfile Item 2 (Will loop/cycle/run 2 times, because there are 3 students)
           |
           v

MData (B/B)                <-- Match Line 1 (matches to csv element 1)
DMATCH2                    <-- Match Line 2 (matches to csv element 0)
4 James Smith 2333/114441  <-- Match Line 3 (matches to csv element 8) (Loop 1)
4 Mike 90000/222311        <-- Match Line 3 (matches to csv element 8) (Loop 2)
4 Jessica Long 2333/114441 <-- Match Line 3 (matches to csv element 8) (Loop 3)
Data $50.90                <-- If "Data" Exists then filewrite to csv element 14 (Loop 1)   
bla3 $5.44                <-- If "bla3" Exists then filewrite to csv element 16 (Loop 1)


     Textfile Item 3 (Will loop/cycle/run 2 times, because there are 2 students)
           |
           v

Mdata                      <-- Match Line 1 (matches to csv element 1)
DMATCH3                    <-- Match Line 2 (matches to csv element 0)
5 Joe Reane 0/0            <-- Match Line 3 (matches to csv element 8) (Loop 1)
5 Peter Jones 90000/222311 <-- Match Line 3 (matches to csv element 8) (Loop 2)
misc2 $420.00              <-- If "misc2" Exists then filewrite to csv element 15 (Loop 1)
bla3 $210.00               <-- If "bla3" Exists then filewrite to csv element 16 (Loop 1)

Uncommented Real Textfile:

MData (N/A)
DMATCH1
3 Tommy 144512/23332
1 Jim 90000/222311
1 Elz M 90000/222311
1 Ben 90000/222311
Data $50.90
misc2 $10.40
bla3 $20.20


MData (B/B) 
DMATCH2
4 James Smith 2333/114441
4 Mike 90000/222311
4 Jessica Long 2333/114441
Data $50.90
bla3 $5.44


Mdata
DMATCH3
5 Joe Reane 0/0
5 Peter Jones 90000/222311
Data $10.91
misc2 $420.00
bla3 $210.00

CSV Before:

MATCH1,MATCH2,TITLE,TITLE,TITLE,TITLE,TITLE,TITLE,MATCH3,DATA,TITLE,TITLE
DMATCH1,MData (N/A),data,data,data,data,data,data,Tommy,55,data,data
DMATCH1,MData (N/A),data,data,data,data,data,data,Ben,54,data,data
DMATCH1,MData (N/A),data,data,data,data,data,data,Jim,52,data,data
DMATCH1,MData (N/A),data,data,data,data,data,data,Elz M,22,data,data
DMATCH2,MData (B/B),data,data,data,data,data,data,James Smith,15,data,data
DMATCH2,MData (B/B),data,data,data,data,data,data,Jessica Long,224,data,data
DMATCH2,MData (B/B),data,data,data,data,data,data,Mike,62,data,data
DMATCH3,Mdata,data,data,data,data,data,data,Joe Reane,66,data,data
DMATCH3,Mdata,data,data,data,data,data,data,Peter Jones,256,data,data
DMATCH3,Mdata,data,data,data,data,data,data,Lesley Lope,5226,data,data

CSV After:

MATCH1,MATCH2,TITLE,TITLE,TITLE,TITLE,TITLE,TITLE,MATCH3,DATA,TITLE,TITLE,,,,,
DMATCH1,MData (N/A),data,data,data,data,data,data,Tommy,55,data,data,3,144512/23332,Data $50.90,misc2 $10.40,bla3 $20.20
DMATCH1,MData (N/A),data,data,data,data,data,data,Ben,54,data,data,1,90000/222311,,,
DMATCH1,MData (N/A),data,data,data,data,data,data,Jim,52,data,data,1,90000/222311,,,
DMATCH1,MData (N/A),data,data,data,data,data,data,Elz M,22,data,data,1,90000/222311,,,
DMATCH2,MData (B/B),data,data,data,data,data,data,James Smith,15,data,data,4,2333/114441,Data $50.90,,bla3 $5.44
DMATCH2,MData (B/B),data,data,data,data,data,data,Jessica Long,224,data,data,4,2333/114441,,,
DMATCH2,MData (B/B),data,data,data,data,data,data,Mike,62,data,data,4,90000/222311,,,
DMATCH3,Mdata,data,data,data,data,data,data,Joe Reane,66,data,data,5,0/0,,misc2 $420.00,bla3 $210.00
DMATCH3,Mdata,data,data,data,data,data,data,Peter Jones,256,data,data,5,90000/222311,,,
DMATCH3,Mdata,data,data,data,data,data,data,Lesley Lope,5226,data,data,,,,,

Anyone know how to acheive this?

Any help would be much much appreciated!

标签: python csv
1条回答
We Are One
2楼-- · 2020-08-01 16:15

There are actually several sub-problems in this problem. First we have to read the interestingly-formatted text file:

Read matchers text file

# each block in the text file will be one element of this list
matchers = [[]]
i = 0 
with open('test.txt') as infile:
    for line in infile:
        line = line.strip()
        # Blocks are seperated by blank lines
        if len(line) == 0:
            i += 1
            matchers.append([])
            # assume there are always two blank lines between items 
            # and just skip to the lext line
            infile.next()
            continue
        matchers[i].append(line)

At this point we have a list of lists, one element for each block, containing one element for each line. Then we have to convert to something more table-like

Convert to table-like format

import re

# This regular expression matches the variable number of students in each block
studentlike = re.compile('(\d+) (.+) (\d+/\d+)')
# We will build a table containing a list of elements for each student
table = []
for matcher in matchers:
    # We use an iterator over the block lines to make indexing simpler
    it = iter(matcher)
    # The first two elements are match values
    m1, m2 = it.next(), it.next()
    # then there are a number of students
    students = []
    for possiblestudent in it:
        m = studentlike.match(possiblestudent)
        if m:
            students.append(list(m.groups()))
        else:
            break
    # After the students come the data elements, which we read into a dictionary
    # We also add in the last possible student line as that didn't match the student re
    dataitems = dict(item.split() for item in [possiblestudent] + list(it))
    datanames = dataitems.keys()
    # Finally we construct the table
    for student in students:
        # We use the dictionary .get() method to return blanks for the missing fields
        table.append([m1, m2] + student + [dataitems.get(d, '') for d in datanames])
print table

Join with pandas

Now, we can join up the data. I've used Pandas here, as it is just great for this kind of joining:

import pandas
csvdata = pandas.read_csv('test.csv')
textdata = pandas.DataFrame(table, columns=['MATCH2', 'MATCH1', 'TITLE01', 'MATCH3', 'TITLE02', 'Data', 'misc2', 'bla3'])
mergeddata = pandas.merge(csvdata, textdata, how='left', on=['MATCH1', 'MATCH2', 'MATCH3'], sort=False)
mergeddata.to_csv('output.csv', index=False)
查看更多
登录 后发表回答