Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Sequence file compression in Hive


Copy link to this message
-
Sequence file compression in Hive
Hi,

I have a table stored as SEQUENCEFILE in hive-0.10,* facts520_normal_seq*

Now, i wish to create another table stored as a SEQUENCEFILE itself, but
compressed using the Gzip codec.

So, i set the compression codec and type as BLOCK and then executed the
following query:

*SET hive.exec.compress.output=true;*
*SET
mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;*
*SET mapred.output.compression.type=BLOCK;*

*create table test1facts520_gzip_seq as select * from facts520_normal_seq;*
*
*
The table got created and was compressed as well.

*[root@aana1 comp_data]# sudo -u hdfs hadoop fs -ls
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq*
*Found 5 items*
*-rw-r--r--   3 admin supergroup   38099145 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000000_0.gz*
*-rw-r--r--   3 admin supergroup   31450189 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000001_0.gz*
*-rw-r--r--   3 admin supergroup   20764259 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000002_0.gz*
*-rw-r--r--   3 admin supergroup   21107597 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000003_0.gz*
*-rw-r--r--   3 admin supergroup   12202692 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000004_0.gz*
*
*
However, when i checked the table properties, it was surprising to see that
the table has been stored as a textfile!

*hive> show create table test1facts520_gzip_seq;*
*OK*
*CREATE  TABLE test1facts520_gzip_seq(*
*  fact_key bigint,*
*  products_key int,*
*  retailers_key int,*
*  suppliers_key int,*
*  time_key int,*
*  units int)*
*ROW FORMAT SERDE*
*  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'*
*STORED AS INPUTFORMAT*
*  'org.apache.hadoop.mapred.TextInputFormat'*
*OUTPUTFORMAT*
*  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'*
*LOCATION*
*  'hdfs://
aana1.ird.com/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq'*
*TBLPROPERTIES (*
*  'numPartitions'='0',*
*  'numFiles'='5',*
*  'transient_lastDdlTime'='1370867198',*
*  'numRows'='0',*
*  'totalSize'='123623882',*
*  'rawDataSize'='0')*
*Time taken: 0.15 seconds*
*
*
*
*
So, i tried adding the STORED AS clause to my earlier create table
statement and created a new table:

*create table test3facts520_gzip_seq STORED AS SEQUENCEFILE as select *
from facts520_normal_seq;*
*
*
This time, the output table got stored as a SEQUENCEFILE,

*hive> show create table test3facts520_gzip_seq;*
*OK*
*CREATE  TABLE test3facts520_gzip_seq(*
*  fact_key bigint,*
*  products_key int,*
*  retailers_key int,*
*  suppliers_key int,*
*  time_key int,*
*  units int)*
*ROW FORMAT SERDE*
*  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'*
*STORED AS INPUTFORMAT*
*  'org.apache.hadoop.mapred.SequenceFileInputFormat'*
*OUTPUTFORMAT*
*  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'*
*LOCATION*
*  'hdfs://
aana1.ird.com/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq'*
*TBLPROPERTIES (*
*  'numPartitions'='0',*
*  'numFiles'='5',*
*  'transient_lastDdlTime'='1370867777',*
*  'numRows'='0',*
*  'totalSize'='129811519',*
*  'rawDataSize'='0')*
*Time taken: 0.135 seconds*

But, the compression itself did not happen!

*[root@aana1 comp_data]# sudo -u hdfs hadoop fs -ls
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq*
*Found 5 items*
*-rw-r--r--   3 admin supergroup   40006368 2013-06-10 18:06
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000000_0*
*-rw-r--r--   3 admin supergroup   33026961 2013-06-10 18:06
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000001_0*
*-rw-r--r--   3 admin supergroup   21797242 2013-06-10 18:05
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000002_0*
*-rw-r--r--   3 admin supergroup   22171637 2013-06-10 18:05
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000003_0*
*-rw-r--r--   3 admin supergroup   12809311 2013-06-10 18:05
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000004_0*

Is there anything that I have done wrong, or I have missed something ?

Any help would be greatly appreciated!

Thank you,
Sachin
+
Stephen Sprague 2013-06-10, 18:15
+
Alexander Pivovarov 2013-06-10, 18:37
+
tofunmibabatunde@... 2013-06-10, 18:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB