Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> ColumnarSerDe and LazyBinaryColumnarSerDe


Copy link to this message
-
ColumnarSerDe and LazyBinaryColumnarSerDe
Hi,

Is LazyBinaryColumnarSerDe more space efficient than ColumnarSerDe in
general?

Let me make my question more specific.

I generated two tables from the table lineitem of TPC-H
using ColumnarSerDe and LazyBinaryColumnarSerDe as follows...
CREATE TABLE lineitem_rcfile_lazybinary
ROW FORMAT SERDE
"org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe"
STORED AS RCFile AS
SELECT * from lineitem;

CREATE TABLE lineitem_rcfile_lazy
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
STORED AS RCFile AS
SELECT * from lineitem;

Since serialization of LazyBinaryColumnarSerDe is binary-based and that
of ColumnarSerDe is text-based, I expect to see
table lineitem_rcfile_lazybinary is smaller than lineitem_rcfile_lazy.
However, no matter whether compression is
enabled, lineitem_rcfile_lazybinary is little bit larger
than lineitem_rcfile_lazy. Did I use LazyBinaryColumnarSerDe in a wrong way?

btw, the row group size of RCFile is 32MB.

Thanks,

Yin