It seems like we have to options here:
new SimpleDateFormat("yyyy-MM-dd").parse(value)
(see: JavaStringDateObjectInspector.parseIf we're preserving the original OpenX serde semantics, that means:
"2025-1-1"
since that's what SimpleDateFormat
did.I would support matching the permissive OpenX SimpleDateFormat equivalent behavior, but only for the OpenX SerDe (as is the approach here, avoiding using the shared HiveFormatUtils.parseHiveDate
method).
cc: @electrum, @dain - any thoughts on what the compatibility goal should be here and whether special casing this logic in OpenX is the right approach to take?
Login to write a write a comment.
Description
The parseHiveDate method in HiveFormatUtils.java that the native OpenX reader was using only supported
a space delimiter to remove any characters after 'yyyy-mm-dd'. As a result, while '2025-01-04 00:00:00.000Z'
was correctly parsed as '2025-01-04', strings like '2025-01-04T00:00:00.000Z' or '2025-01-04AA00:00:00.000Z'
were throwing exceptions and being parsed as null.
This new parseHiveDate method removes any characters after 'yyyy-mm-dd', regardless of the delimiter using regex.
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(O) Release notes are required, with the following suggested text: