Need to clean up a csv import, which gives me a range of times (in string form). Code is at bottom; I currently use regular expressions and replace()
on the df to convert other chars. Just not sure how to:
- select the current 24 hour format numbers and add :00
- how to select the 12 hour format numbers and make them 24 hour.
Input (from csv import):
break_notes
0 15-18
1 18.30-19.00
2 4PM-5PM
3 3-4
4 4-4.10PM
5 15 - 17
6 11 - 13
So far I have got it to look like (remove spaces, AM/PM, replace dot with colon):
break_notes
0 15-18
1 18:30-19:00
2 4-5
3 3-4
4 4-4:10
5 15-17
6 11-13
However, I would like it to look like this ('HH:MM-HH:MM' format):
break_notes
0 15:00-18:00
1 18:30-19:00
2 16:00-17:00
3 15:00-16:00
4 16:00-16:10
5 15:00-17:00
6 11:00-13:00
My code is:
data = pd.read_csv('test.csv')
data.break_notes = data.break_notes.str.replace(r'([P].|[ ])', '').str.strip()
data.break_notes = data.break_notes.str.replace(r'([.])', ':').str.strip()
Here is the converter function that you need based on your requested input data.
convert_entry
takes complete value entry, splits it on a dash, and passes its result toconvert_single
, since both halfs of one entry can be converted individually. After each conversion, it merges them with a dash.convert_single
uses regex to search for important parts in the time string. It starts with a some numbers\d+
(representing the hours), then optionally a dot or a colon and some more number[.:]?(\d+)?
(representing the minutes). And after that optionally AM or PM(AM|PM)?
(only PM is relevant in this case)