Toy code in coliru I am using for testing: http://coliru.stacked-crooked.com/a/4039865d8d4dad52
I am getting used to C++ again after a long hiatus from it. I am writing code that parses a CSV that may have several columns with dates or nulls. My assumption is that every date column has exactly one kind of valid date format though different columns may have different formats.
For each date column that I have, I find the first value that is successfully parsed as a date given an std::vector of potential locales with a boost date_input_facet object. That first date that parses correctly will then return the index in my array of locales that worked. Once I have the appropriate format for the first parsable date, I want to fix that format forever more so that I no longer have to waste CPU time detecting the format.
Here is my array of locales:
const std::vector<std::locale> Date::date_formats = {
std::locale(std::locale::classic(), new date_input_facet("%Y-%m-%d")),
std::locale(std::locale::classic(), new date_input_facet("%Y/%m/%d")),
std::locale(std::locale::classic(), new date_input_facet("%m-%d-%Y")),
std::locale(std::locale::classic(), new date_input_facet("%m/%d/%Y")),
std::locale(std::locale::classic(), new date_input_facet("%d-%b-%Y")),
std::locale(std::locale::classic(), new date_input_facet("%Y%m%d")),
};
I use an array of date strings from 20170101 to 20170131 to test this out. I then print out the original date strings, the date that was parsed, along with the index of the date_formats vector that worked for parsing.
For 20170101 to 201700129, it says that the 0th index worked which is supposed to have the "%Y-%m-%d" format with the dashes?!?! Moreover, where the dashes go, I have numbers so it is reads 20170101 as 2017-10- then drop the last dash and interprets it as October 2017 which without a date is Oct 1, 2017. Why would it do that when that is not the format it was supposed to use?
Some results that one could see from my coliru (pY is parsed year, etc):
YYYYMMDD pY pM pD format_index
20170101 2017 Oct 1 0
20170102 2017 Oct 1 0
20170103 2017 Oct 1 0
20170104 2017 Oct 1 0
20170105 2017 Oct 1 0
For 20170130, 20170131, the correct format index (the 5th) is reported for "%Y%m%d".
Any ideas? I only want the precise format string I passed to be used.
Using Howard Hinnant's free, open-source C++11/14/17 date/time library, this:
Outputs:
I've made a multi-format capable date-time parser myself. I, too, found it hard/impossible to get the parsing strict using the facilities in the standard library and boost.
I ended up using
strptime
- mostly¹.adaptive_parser
Intended to be seeded with a list of supported formats, in order of preference. By default, parser is not adaptive (mode is
fixed
).In adaptive modes the format can be required to be
sticky
(consistently reuse the first matched format)ban_failed
(remove failed patterns from the list; banning only occurs on successful parse to avoid banning all patterns on invalid input)mru
(preserves the list but re-orders for performance)Demo
I tried the parser on your test data:
Prints:
¹ just the timezone stuff needs tweaks, mostly