An accounting format for numeric values usually uses a currency character, and often uses parentheses to represent negative values. Zero may also be represented as a -
or $-
. When such a series is imported into a Pandas DataFrame it is an object type. I need to convert it to a numeric type and parse the negative values correctly.
Here's an example:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
df = pd.DataFrame({'A':['123.4', '234.5', '345.5', '456.7'],
'B':['$123.4', '$234.5', '$345.5', '$456.7'],
'C':['($123.4)', '$234.5', '($345.5)', '$456.7'],
'D':['$123.4', '($234.5)', '$-', '$456.7']})
Series A is easy to convert e.g.
df['A'] = df['A'].astype(float)
Series B required the removal of the $
sign, after which it is then straightforward.
Then comes series C and D. They contain parentheses (i.e. negative) values and D contains $-
for zero. How can I correctly parse theses series into numeric series / dataframe?