Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Feature for Date/Time Data Types in Avro?


Copy link to this message
-
Re: Feature for Date/Time Data Types in Avro?
Jeff Hammerbacher 2011-01-20, 01:31
https://issues.apache.org/jira/browse/AVRO-739

On Tue, Jan 18, 2011 at 11:49 AM, Scott Carey <[EMAIL PROTECTED]>wrote:

> We should get this discussion into JIRA soon.
>
> On 1/18/11 10:38 AM, "Ron Bodkin" <[EMAIL PROTECTED]> wrote:
>
> >Overall, yes. A couple of points worth addressing in a design:
> >
> >1) Do we want to allow encoding time zone data in the records? Storing a
> >raw timestamp is sometimes not ideal. It's worth looking at how SQL allows
> >timestamps with and without time zones. Is that simpler, or is it actually
> >more complex?
>
> It is generally 100000x simpler to serialize only in UTC and let libraries
> support what they support W.R.T timezone.  Painful memories of design
> mistakes past.
> SQL does a lot of TZ work because they support user input and output
> formatting.  In the back-end most databases store in only a limited way.
>
> >2) Do we want to allow dates (for storing a day, without a timestamp)?
> Days introduce timezone complexity if you want to find out what day a
> timestamp is in.
> So if we support day, or hour, then that is a significant increase in
> complexity.  Furthermore, the timezone may  not even be the same per row.
>  We could leave that up to the user and support a day type that is merely
> the number of days since some origin point and leaves the timezone
> interpretation (and thus conversion to 'day' from 'datetime') in the
> user's hands, perhaps with metadata support.
>
>
> >3) It would be nice to allow some flexibility in the implementation
> >classes for dates, e.g., letting Java users use Joda time classes as well
> >as java.util.Date
>
> Absolutely.  This is a per-language feature though, so it may not require
> much of the spec.  For example, in Java it could simply be a configuration
> parameter passed to the DatumReader/Writers.  It doesn't make a lot of
> sense to store metadata on the data that says "this is a Joda object, not
> java.util.Date" -- that is a user choice and not intrinsic to describing
> the data.
>
> There are other questions too -- what are the timestamp units
> (milliseconds? configurable?), what is the origin (1970? 2010?
> configurable?) -- these decisions affect the serialization size.
> I have a manual serialization of timestamps that is a long, in tenths of a
> second since 2008, for example.  I have another that is a duration
> measured in tenths of a millisecond.  Both were done to reduce the number
> of bytes per value for a specific problem domain.
> Although I could use such flexibility, I'm not sure that is enough of a
> motivator to put that into Avro.  I'm not very bothered with converting
> from long to a human readable datetime myself.
>
> >
> >Ron
> >
> >
> >Ron Bodkin
> >CEO
> >Think Big Analytics
> >m: +1 (415) 509-2895
> >
> >
> >
> >
> >
> >
> >
> >
> >On 1/18/11 8:42 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:
> >
> >>The way that I have imagined doing this is to specify a standard schema
> >>for dates, then implementations can optionally map this to a native date
> >>type.
> >>
> >>The schema could be a record containing a long, e.g.:
> >>
> >>{"type": "record", "name":"org.apache.avro.lib.Date", "fields" : [
> >>   {"name": "time", "type": "long"}
> >>  ]
> >>}
> >>
> >>Java could read this into a java.util.Date, Python to a datetime, etc.
> >>Such conventions could be added to the Avro specification.
> >>
> >>Does this sound like a reasonable approach?
> >>
> >>Doug
> >>
> >>On 01/17/2011 05:54 PM, Ron Bodkin wrote:
> >>> Has anyone discussed the possibility of having built-in support for a
> >>> date/time stamp data type in Avro? I think it'd be helpful, since dates
> >>> and timestamps are often used as keys in processing map/reduce data
> >>>(and
> >>> in RPC systems). It's unpleasant to have to write code that converts
> >>> longs or strings into dates or timestamps. Minimally, it would be
> >>>useful
> >>> to allow generating date/time stamps from long timestamps in the client
> >>> APIs various language code and to have support for working with Dates