Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> ColumnarSerDe and LazyBinaryColumnarSerDe


Copy link to this message
-
ColumnarSerDe and LazyBinaryColumnarSerDe
Hi,

Is LazyBinaryColumnarSerDe more space efficient than ColumnarSerDe in
general?

Let me make my question more specific.

I generated two tables from the table lineitem of TPC-H
using ColumnarSerDe and LazyBinaryColumnarSerDe as follows...
CREATE TABLE lineitem_rcfile_lazybinary
ROW FORMAT SERDE
"org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe"
STORED AS RCFile AS
SELECT * from lineitem;

CREATE TABLE lineitem_rcfile_lazy
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
STORED AS RCFile AS
SELECT * from lineitem;

Since serialization of LazyBinaryColumnarSerDe is binary-based and that
of ColumnarSerDe is text-based, I expect to see
table lineitem_rcfile_lazybinary is smaller than lineitem_rcfile_lazy.
However, no matter whether compression is
enabled, lineitem_rcfile_lazybinary is little bit larger
than lineitem_rcfile_lazy. Did I use LazyBinaryColumnarSerDe in a wrong way?

btw, the row group size of RCFile is 32MB.

Thanks,

Yin
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB