Convert a Pandas Series in Accounting Format to a

2020-07-27 16:13发布

问题:

An accounting format for numeric values usually uses a currency character, and often uses parentheses to represent negative values. Zero may also be represented as a - or $-. When such a series is imported into a Pandas DataFrame it is an object type. I need to convert it to a numeric type and parse the negative values correctly.

Here's an example:

import numpy as np
import pandas as pd
from pandas import Series, DataFrame
df = pd.DataFrame({'A':['123.4', '234.5', '345.5', '456.7'],
                   'B':['$123.4', '$234.5', '$345.5', '$456.7'],
                   'C':['($123.4)', '$234.5', '($345.5)', '$456.7'],
                   'D':['$123.4', '($234.5)', '$-', '$456.7']})

Series A is easy to convert e.g.

df['A'] = df['A'].astype(float) 

Series B required the removal of the $ sign, after which it is then straightforward.

Then comes series C and D. They contain parentheses (i.e. negative) values and D contains $- for zero. How can I correctly parse theses series into numeric series / dataframe?

回答1:

I'd use the Pandas replace function to replace $ and ) by nothing, replace - by 0, and then finally replace ( by -. Then you can do df=astype(float) and it should work.



回答2:

import numpy as np

def pd_columntonumbeR(df, colname):
    for c in colname:
        df[c] = np.vectorize(replacetonumbeR)(df[c])
        df[c].fillna(0, inplace=True)
        df[c] = pd.to_numeric(df[c])


def replacetonumbeR(s):
    if type(s).__name__ == "str":
        s = s.strip()
        if s == "-":
            s = 0
        else:
            s = s.replace(",","").replace("$","")
            if s.find("(") >= 0 and s.find(")") >= 0:
                s = s.replace("(","-").replace(")","")
    return s