Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> lazy deserialization?


Copy link to this message
-
Re: lazy deserialization?
Not with any of today's APIs.  "SELECT col1, col3 FROM t" is handled
easily: you construct a schema that only has those columns, and col2
is skipped at read time.

Does Hive have a use case for this that you're interested in?  If you
don't mind paying the buffer copy, you could probably write a
"DeferredFoo" class that doesn't de-serialize certain structures...

-- Philip

On Fri, Jan 22, 2010 at 6:20 PM, Zheng Shao <[EMAIL PROTECTED]> wrote:
> I noticed that avro has the "skip" functions which can help skip a
> field when deserializing data.
> This is good for column pruning in most cases, but we might be able to
> do better in the following case.
>
>
> Let's say we have a query like this:
>
> CREATE TABLE t (col1 STRING, col2 STRING, col3 STRING);
> SELECT col2 FROM t WHERE col3 = 'abcde';
>
> We want to get field col3 first, if that matches what we want, then we
> want to get to field col2.
>
>
> Is there anyway to "remember" the current location of deserialization,
> so that we can "resume" from that point?
>
>
> --
> Yours,
> Zheng
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB