I have a .csv
file named fileOne.csv
that contains many unnecessary strings and records. I want to delete unnecessary records / rows and strings based on multiple condition / criteria using a Python or R script and save the records into a new .csv
file named resultFile.csv
.
What I want to do is as follows:
Delete the first column.
Split column BB into two column named as
a_id
, andc_id
. Separate the value by _ (underscore) and left side will go toa_id
, and right side will go toc_id
.Keep only records that have the .csv file extension in the files column, but do not contain
No Bi
in cut column.Assign new name to each of the columns.
Delete the records that contain strings like
less
in the CC column.Trim all other unnecessary string from the records.
Delete the reamining filds of each rows after I find the "Mi" in each rows.
My fileOne.csv
is as follows:
AA BB CC DD EE FF GG
1 1_1.csv (=0 =10" 27" =57 "Mi"
0.97 0.9 0.8 NaN 0.9 od 0.2
2 1_3.csv (=0 =10" 27" "Mi" 0.5
0.97 0.5 0.8 NaN 0.9 od 0.4
3 1_6.csv (=0 =10" "Mi" =53 cnt
0.97 0.9 0.8 NaN 0.9 od 0.6
4 2_6.csv No Bi 000 000 000 000
5 2_8.csv No Bi 000 000 000 000
6 6_9.csv less 000 000 000 000
7 7_9.csv s(=0 =26" =46" "Mi" 121
My 1st expected results files would be as follows:
a_id b_id CC DD EE FF GG
1 1 0 10 27 57 Mi
1 3 0 10 27 Mi 0.5
1 6 0 10 Mi 53 cnt
7 9 0 26 46 Mi 121
My final expected results files would be as follows:
a_id b_id CC DD EE FF GG
1 1 0 10 27 57
1 3 0 10 27
1 6 0 10
7 9 0 26 46
This can be achieved with the following Python script:
To simply remove
Mi
columns from an existing file the following can be used:Tested using Python 2.7.9