Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Feature for Date/Time Data Types in Avro?


+
Ron Bodkin 2011-01-18, 01:54
+
Jeff Hammerbacher 2011-01-18, 05:05
+
Doug Cutting 2011-01-18, 16:42
+
Jeremy Custenborder 2011-01-18, 17:19
+
Doug Cutting 2011-01-18, 17:30
+
Scott Carey 2011-01-18, 18:20
+
Ron Bodkin 2011-01-18, 18:38
+
Scott Carey 2011-01-18, 19:49
Copy link to this message
-
Re: Feature for Date/Time Data Types in Avro?
https://issues.apache.org/jira/browse/AVRO-739

On Tue, Jan 18, 2011 at 11:49 AM, Scott Carey <[EMAIL PROTECTED]>wrote:

> We should get this discussion into JIRA soon.
>
> On 1/18/11 10:38 AM, "Ron Bodkin" <[EMAIL PROTECTED]> wrote:
>
> >Overall, yes. A couple of points worth addressing in a design:
> >
> >1) Do we want to allow encoding time zone data in the records? Storing a
> >raw timestamp is sometimes not ideal. It's worth looking at how SQL allows
> >timestamps with and without time zones. Is that simpler, or is it actually
> >more complex?
>
> It is generally 100000x simpler to serialize only in UTC and let libraries
> support what they support W.R.T timezone.  Painful memories of design
> mistakes past.
> SQL does a lot of TZ work because they support user input and output
> formatting.  In the back-end most databases store in only a limited way.
>
> >2) Do we want to allow dates (for storing a day, without a timestamp)?
> Days introduce timezone complexity if you want to find out what day a
> timestamp is in.
> So if we support day, or hour, then that is a significant increase in
> complexity.  Furthermore, the timezone may  not even be the same per row.
>  We could leave that up to the user and support a day type that is merely
> the number of days since some origin point and leaves the timezone
> interpretation (and thus conversion to 'day' from 'datetime') in the
> user's hands, perhaps with metadata support.
>
>
> >3) It would be nice to allow some flexibility in the implementation
> >classes for dates, e.g., letting Java users use Joda time classes as well
> >as java.util.Date
>
> Absolutely.  This is a per-language feature though, so it may not require
> much of the spec.  For example, in Java it could simply be a configuration
> parameter passed to the DatumReader/Writers.  It doesn't make a lot of
> sense to store metadata on the data that says "this is a Joda object, not
> java.util.Date" -- that is a user choice and not intrinsic to describing
> the data.
>
> There are other questions too -- what are the timestamp units
> (milliseconds? configurable?), what is the origin (1970? 2010?
> configurable?) -- these decisions affect the serialization size.
> I have a manual serialization of timestamps that is a long, in tenths of a
> second since 2008, for example.  I have another that is a duration
> measured in tenths of a millisecond.  Both were done to reduce the number
> of bytes per value for a specific problem domain.
> Although I could use such flexibility, I'm not sure that is enough of a
> motivator to put that into Avro.  I'm not very bothered with converting
> from long to a human readable datetime myself.
>
> >
> >Ron
> >
> >
> >Ron Bodkin
> >CEO
> >Think Big Analytics
> >m: +1 (415) 509-2895
> >
> >
> >
> >
> >
> >
> >
> >
> >On 1/18/11 8:42 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:
> >
> >>The way that I have imagined doing this is to specify a standard schema
> >>for dates, then implementations can optionally map this to a native date
> >>type.
> >>
> >>The schema could be a record containing a long, e.g.:
> >>
> >>{"type": "record", "name":"org.apache.avro.lib.Date", "fields" : [
> >>   {"name": "time", "type": "long"}
> >>  ]
> >>}
> >>
> >>Java could read this into a java.util.Date, Python to a datetime, etc.
> >>Such conventions could be added to the Avro specification.
> >>
> >>Does this sound like a reasonable approach?
> >>
> >>Doug
> >>
> >>On 01/17/2011 05:54 PM, Ron Bodkin wrote:
> >>> Has anyone discussed the possibility of having built-in support for a
> >>> date/time stamp data type in Avro? I think it'd be helpful, since dates
> >>> and timestamps are often used as keys in processing map/reduce data
> >>>(and
> >>> in RPC systems). It's unpleasant to have to write code that converts
> >>> longs or strings into dates or timestamps. Minimally, it would be
> >>>useful
> >>> to allow generating date/time stamps from long timestamps in the client
> >>> APIs various language code and to have support for working with Dates
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB