Replace commas except those in quotation marks

2019-08-13 14:57发布

Date,Time,Ref,Sen,ATN,Flow,PCB temp,Status,Battery,BC
2015/04/23,12:30:00,779581,908043,"-15,254",49,31,0,100,
2015/04/23,12:35:00,778715,907084,"-15,259",49,31,0,100,-127
2015/04/23,12:40:00,778299,906419,"-15,239",49,32,0,100,461
(...)

Hi, I have an ascii file like the one above where I am trying to replace the commas for semicolon. This is the code I am using:

filein = open('Prueba1.txt')
fileout = open('Fin.txt', 'wt')
for line in filein:
    if line.startswith('20'):
        fileout.write( line.replace(',', ';') )
filein.close()
fileout.close()

The problem is that now I want to maintain the commas for the 5th column and also take of the quotation marks. Any ideas?

3条回答
混吃等死
2楼-- · 2019-08-13 15:26

Have you considered saving this as a csv file.

after saving it as csv file:

import pandas as pd
v = pd.read_csv("check_delete.csv")

output:

      Date            Time  Ref     Sen      ATN      Flow  PCB temp    Status  Battery BC
0   2015/04/23  12:30:00    779581  908043  -15,254 49  31  0   100   NaN
1   2015/04/23  12:35:00    778715  907084  -15,259 49  31  0   100   -127.0
2   2015/04/23  12:40:00    778299  906419  -15,239 49  32  0   100    461.0

You can read this as a dataframe.

查看更多
干净又极端
3楼-- · 2019-08-13 15:33

This is an answer making use of the split() method if you do not want to use .csv files.

Let's consider the line to be:

line = '2015/04/23,12:35:00,778715,907084,"-15,259",49,31,0,100,-127'

First, split the line into three parts. One before the part in the quotes, the quoted part, and the part after the part in quotes. This can be done by line.split('"').

This will give us a list consisting of these three parts:

alist = ['2015/04/23,12:35:00,778715,907084,', '-15,259', ',49,31,0,100,-127']

Now, split the first and last elements of the list by the comma. And add all the elements to an empty string. Write this string to your new file.

Like this:

left_part = alist[0].split(',')
right_part = alist[2].split(',')
middle_part = alist[1]
final_list = left_part + [middle_part] + right_part
new_line = ''
for part in final_list:
    # to prevent the empty strings to be added add the if condition.
    if part:
        new_line += part + ';' 

The final result should look like this:

'2015/04/23;12:35:00;778715;907084;-15,259;49;31;0;100;-127;'

查看更多
虎瘦雄心在
4楼-- · 2019-08-13 15:38

A solution using the csv Python standard library:

import csv

with open('example.csv', newline='') as inputfile, \
     open('parsedcsv.csv', 'w',  newline='') as outpufile:

    datareader = csv.reader(inputfile, delimiter=',', quotechar='"')

    csvwriter = csv.writer(outpufile, delimiter=';',
            quotechar="'", quoting=csv.QUOTE_MINIMAL)

    csvwriter.writerows(datareader)

Quote from the documentation about the QUOTE_MINIMAL option:

csv.QUOTE_MINIMAL: Instructs writer objects to only quote those fields which contain special characters such as delimiter, quotechar or any of the characters in lineterminator.

The output file is:

Date;Time;Ref;Sen;ATN;Flow;PCB temp;Status;Battery;BC
2015/04/23;12:30:00;779581;908043;-15,254;49;31;0;130;
2015/04/23;12:35:00;778715;907084;-15,259;49;31;0;100;-127
2015/04/23;12:40:00;778299;906419;-15,239;49;32;0;100;461
查看更多
登录 后发表回答