Remove quotes holding 2 words and remove comma bet

2019-07-26 22:41发布

问题:

Following up on Python to replace a symbol between between 2 words in a quote

Extended input and expected output:

trying to replace comma between 2 words Durango and PC in the second line by & and then remove the quotes " as well. Same for third line with Orbis and PC and 4th line has 2 word combos in quotes that I would like to process "AAA - Character Tech, SOF - UPIs","Durango, Orbis, PC"

I would like to retain the rest of the lines using Python.

INPUT

2,SIN-Rendering,Core Tech - Rendering,PC,147,Reopened
2,Kenny Chong,Core Tech - Rendering,"Durango, PC",55,Reopened
3,SIN-Audio,AAA - Audio,"Orbis, PC",13,Open
LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,"AAA - Character Tech, SOF - UPIs","Durango, Orbis, PC",29,Waiting For
...
... 
...

Like these, there can be 100 lines in my sample. So the expected output is:

2,SIN-Rendering,Core Tech - Rendering,PC,147,Reopened
2,Kenny Chong,Core Tech - Rendering, Durango & PC,55,Reopened
3,SIN-Audio,AAA - Audio, Orbis & PC,13,Open
LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,AAA - Character Tech & SOF - UPIs,Durango, Orbis & PC,29,Waiting For
...
...
...

So far, I could think of reading line by line and then if the line contains quote replace it with no character but then replacement of symbol inside is something I am stuck with.

Here is what I have right now:

for line in lines:
            expr2 =  re.findall('"(.*?)"', line)
            if len(expr2)!=0:
                expr3 = re.split('"',line)
                expr4 = expr3[0]+expr3[1].replace(","," &")+expr3[2]
                print >>k, expr4
            else:
                print >>k, line

but it does not consider the case in 4th line? There can be more than 3 combos as well. For eg.

3,SIN-Audio,"AAA - Audio, xxxx, yyyy","Orbis, PC","13, 22",Open 

and wish to make this 3,SIN-Audio,AAA - Audio & xxxx & yyyy, Orbis & PC, 13 & 22,Open

How to achieve this, any suggestion? Learning Python.

回答1:

So, by treating the input file as a .csv we can easily turn the lines into something easy to work with.

For example,

2,Kenny Chong,Core Tech - Rendering, Durango & PC,55,Reopened

is read as:

['2', 'Kenny Chong', 'Core Tech - Rendering', 'Durango, PC', '55', 'Reopened']

Then, by replacing all instances of , with _& (space) we would have the line:

['2', 'Kenny Chong', 'Core Tech - Rendering', 'Durango & PC', '55', 'Reopened']

And it replaces multiple instances of ,s within a line, and when finally writing we no longer have the original double quotes.

Here is the code, given that in.txt is your input file and it will write to out.txt.

import csv

with open('in.txt') as infile:
    reader = csv.reader(infile)

    with open('out.txt', 'w') as outfile:
        for line in reader:
            line = list(map(lambda s: s.replace(',', ' &'), line))
            outfile.write(','.join(line) + '\n')

The fourth line is outputted as:

LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,AAA - Character Tech & SOF - UPIs,Durango & Orbis & PC,29,Waiting For



回答2:

Please check this once: I could not find a single expression that could do this. So did it in a bit elaborate way. Will update if I can find a better way(Python 3)

import re
st = "3,SIN-Audio,\"AAA - Audio, xxxx, yyyy\",\"Orbis, PC\",\"13, 22\",Open"
found = re.findall(r'\"(.*)\"',st)[0].split("\",\"")
final = ""
for word in found:
    final = final + (" &").join(word.split(","))+","
result = re.sub(r'\"(.*)\"',final[:-1],st)
print(result)