Summing up the total based on the random number of

2019-09-21 09:42发布

问题:

I need to sum up the "value" column amount for each value of col1 of the File1 and export it to an output file. I'm new in python and need to do it for thousands of records.

File1

col1 col2              value
559 1   91987224    2400000000
559 0   91987224    100000000
558 0   91987224    100000000
557 2   87978332    500000000
557 1   59966218    2400000000
557 0   64064811    100000000

Desired Output:

col1      Sum 
559     2500000000
558     1000000000
557     3000000000    

Thanks in advance.

P.S : I can't use the pandas library due to permission issues.I tried the following code. Sharing it with trace backs:

import csv 
fin = open("File1.txt","r")
list_txid = {}
num_tx = {}
amount_tx = {}

for line in fin:
    line = line.rstrip()
    f = line.split("\t")
    txid = f[0]
    amount = int(f[3])

fin.close()
for txid in list_txid:
    num_tx[txid] += 1
    amount_tx[txid] += amount
    print("{0}\t{1:d}\t{2:d}".format(txid, amount_tx[txid]))

Traceback :

Traceback (most recent call last): File "C:\Users....\sum.py", line 14, in amount = int(f[3]) IndexError: list index out of range

回答1:

You can use pandas for this:

df = pd.read_csv('in.csv', delim_whitespace=True)

#      col1      col2       value
# 559     1  91987224  2400000000
# 559     0  91987224   100000000
# 558     0  91987224   100000000
# 557     2  87978332   500000000
# 557     1  59966218  2400000000
# 557     0  64064811   100000000

result = df.groupby(df.index)['value'].sum().reset_index()

#    index       value
# 0    557  3000000000
# 1    558   100000000
# 2    559  2500000000

result.to_csv('out.csv', index=False)


回答2:

Use read_csv for create DataFrame, then groupby by index by level=0 and aggregate sum. Last export to_csv:

df = pd.read_csv(file1)
df.groupby(level=0)['value'].sum().to_file(file2)