Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> 2 questions about SerDe


Copy link to this message
-
Re: 2 questions about SerDe
Have a look at the code for the LazySerDes. When you deserialize in the
SerDe, you don't actually have to deserialize all the columns. Deserialized
could return an object that is not actually deserialized and you can write
an ObjectInspector that deserializes a field from that structure but only
when it's needed (that's when the ObjectInspector is called).

R.

On Tue, Feb 21, 2012 at 7:37 AM, Koert Kuipers <[EMAIL PROTECTED]> wrote:

> 1) Is there a way in initialize() of a SerDe to know if it is being used
> as a Serializer or a Deserializer. If not, can i define the Serializer and
> Deserializer separately instead of defining a SerDe (so i have two
> initialize methods)?
>
> 2) Is there a way to find out which columns are being used? say if someone
> does select a,b,c from test, and my SerDe gets initialized for usage in
> that query how can i know that only a,b,c are being needed? i would like to
> take advantage of this information so i dont deserialize unnecessary
> information, without having to resort to more complex lazy deserialization
> tactics.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB