Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> ColumnarSerDe and LazyBinaryColumnarSerDe


Copy link to this message
-
Re: ColumnarSerDe and LazyBinaryColumnarSerDe
Thanks.

I forgot to consider the DOUBLE data type in the table. For the case of
lineitem, ColumnarSerDe can use less bytes to store a double
than LazyBinaryColumnarSerDe (8bytes).

Yin

On Tue, Mar 6, 2012 at 2:42 PM, yongqiang he <[EMAIL PROTECTED]>wrote:

> I guess LazyBinaryColumnarSerDe is not saving spaces, but is cpu efficient.
> You tests aligns with our internal tests long time ago.
>
> On Tue, Mar 6, 2012 at 8:58 AM, Yin Huai <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > Is LazyBinaryColumnarSerDe more space efficient than ColumnarSerDe in
> > general?
> >
> > Let me make my question more specific.
> >
> > I generated two tables from the table lineitem of TPC-H
> > using ColumnarSerDe and LazyBinaryColumnarSerDe as follows...
> > CREATE TABLE lineitem_rcfile_lazybinary
> > ROW FORMAT SERDE
> > "org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe"
> > STORED AS RCFile AS
> > SELECT * from lineitem;
> >
> > CREATE TABLE lineitem_rcfile_lazy
> > ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
> > STORED AS RCFile AS
> > SELECT * from lineitem;
> >
> > Since serialization of LazyBinaryColumnarSerDe is binary-based and that
> > of ColumnarSerDe is text-based, I expect to see
> > table lineitem_rcfile_lazybinary is smaller than lineitem_rcfile_lazy.
> > However, no matter whether compression is
> > enabled, lineitem_rcfile_lazybinary is little bit larger
> > than lineitem_rcfile_lazy. Did I use LazyBinaryColumnarSerDe in a wrong
> way?
> >
> > btw, the row group size of RCFile is 32MB.
> >
> > Thanks,
> >
> > Yin
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB