Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> ColumnarSerDe and LazyBinaryColumnarSerDe


Copy link to this message
-
Re: ColumnarSerDe and LazyBinaryColumnarSerDe
Thanks.

I forgot to consider the DOUBLE data type in the table. For the case of
lineitem, ColumnarSerDe can use less bytes to store a double
than LazyBinaryColumnarSerDe (8bytes).

Yin

On Tue, Mar 6, 2012 at 2:42 PM, yongqiang he <[EMAIL PROTECTED]>wrote:

> I guess LazyBinaryColumnarSerDe is not saving spaces, but is cpu efficient.
> You tests aligns with our internal tests long time ago.
>
> On Tue, Mar 6, 2012 at 8:58 AM, Yin Huai <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > Is LazyBinaryColumnarSerDe more space efficient than ColumnarSerDe in
> > general?
> >
> > Let me make my question more specific.
> >
> > I generated two tables from the table lineitem of TPC-H
> > using ColumnarSerDe and LazyBinaryColumnarSerDe as follows...
> > CREATE TABLE lineitem_rcfile_lazybinary
> > ROW FORMAT SERDE
> > "org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe"
> > STORED AS RCFile AS
> > SELECT * from lineitem;
> >
> > CREATE TABLE lineitem_rcfile_lazy
> > ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
> > STORED AS RCFile AS
> > SELECT * from lineitem;
> >
> > Since serialization of LazyBinaryColumnarSerDe is binary-based and that
> > of ColumnarSerDe is text-based, I expect to see
> > table lineitem_rcfile_lazybinary is smaller than lineitem_rcfile_lazy.
> > However, no matter whether compression is
> > enabled, lineitem_rcfile_lazybinary is little bit larger
> > than lineitem_rcfile_lazy. Did I use LazyBinaryColumnarSerDe in a wrong
> way?
> >
> > btw, the row group size of RCFile is 32MB.
> >
> > Thanks,
> >
> > Yin
>