I have a pandas dataframe column (Data Type) which I want to split into three columns
target_table_df = LoadS_A [['Attribute Name',
'Data Type',
'Primary Key Indicator']]
Example input (target_table_df)
Attribute Name Data Type Primary Key Indicator
0 ACC_LIM DECIMAL(18,4) False
1 ACC_NO NUMBER(11,0) False
2 ACC_OPEN_DT DATE False
3 ACCB DECIMAL(18,4) False
4 ACDB DECIMAL(18,4) False
5 AGRMNT_ID NUMBER(11,0) True
6 BRNCH_NUM NUMBER(11,0) False
7 CLRD_BAL DECIMAL(18,4) False
8 CR_INT_ACRD_GRSS DECIMAL(18,4) False
9 CR_INT_ACRD_NET DECIMAL(18,4) False
I aim to:
- Reassign 'Data Type' to the text preceding the parenthesis
[..if parenthesis exists in 'Data Type']:
- Create new column 'Precision' and assign to first comma separated value
- Create new column 'Scale' and assign to second comma separated value
Intended output would therefore become:
Data Type Precision Scale
0 decimal 18 4
1 number 11 0
2 date
3 decimal 18 4
4 decimal 18 4
5 number 4 0
I have tried in anger to achieve this but i'm new to dataframes....can't work out if I am to iterate over all rows or if there is a way to apply to all values in the dataframe?
Any help much appreciated
Use
target_table_df['Data Type'].str.extract(pattern)
You'll need to assign
pattern
to be a regular expression that captures each of the components you're looking for.([^\(]+)
says grab as many non-open parenthesis characters you can up to the first open parenthesis.\(([^,]*,
says to grab the first set of non-comma characters after an open parenthesis and stop at the comma.,(.*)\)
says to grab the rest of the characters between the comma and the close parenthesis.(\(([^,]*),(.*)\))?
says the whole parenthesis thing may not even happen, grab it if you can.Solution
everything together looks like this:
I put a
.iloc[:, [0, 2, 3]]
at the end because the pattern I used grabs the whole parenthesis in column1
and I wanted to skip it. Leave it off and see.