Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Sequence file compression in Hive


Copy link to this message
-
Sequence file compression in Hive
Hi,

I have a table stored as SEQUENCEFILE in hive-0.10,* facts520_normal_seq*

Now, i wish to create another table stored as a SEQUENCEFILE itself, but
compressed using the Gzip codec.

So, i set the compression codec and type as BLOCK and then executed the
following query:

*SET hive.exec.compress.output=true;*
*SET
mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;*
*SET mapred.output.compression.type=BLOCK;*

*create table test1facts520_gzip_seq as select * from facts520_normal_seq;*
*
*
The table got created and was compressed as well.

*[root@aana1 comp_data]# sudo -u hdfs hadoop fs -ls
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq*
*Found 5 items*
*-rw-r--r--   3 admin supergroup   38099145 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000000_0.gz*
*-rw-r--r--   3 admin supergroup   31450189 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000001_0.gz*
*-rw-r--r--   3 admin supergroup   20764259 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000002_0.gz*
*-rw-r--r--   3 admin supergroup   21107597 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000003_0.gz*
*-rw-r--r--   3 admin supergroup   12202692 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000004_0.gz*
*
*
However, when i checked the table properties, it was surprising to see that
the table has been stored as a textfile!

*hive> show create table test1facts520_gzip_seq;*
*OK*
*CREATE  TABLE test1facts520_gzip_seq(*
*  fact_key bigint,*
*  products_key int,*
*  retailers_key int,*
*  suppliers_key int,*
*  time_key int,*
*  units int)*
*ROW FORMAT SERDE*
*  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'*
*STORED AS INPUTFORMAT*
*  'org.apache.hadoop.mapred.TextInputFormat'*
*OUTPUTFORMAT*
*  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'*
*LOCATION*
*  'hdfs://
aana1.ird.com/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq'*
*TBLPROPERTIES (*
*  'numPartitions'='0',*
*  'numFiles'='5',*
*  'transient_lastDdlTime'='1370867198',*
*  'numRows'='0',*
*  'totalSize'='123623882',*
*  'rawDataSize'='0')*
*Time taken: 0.15 seconds*
*
*
*
*
So, i tried adding the STORED AS clause to my earlier create table
statement and created a new table:

*create table test3facts520_gzip_seq STORED AS SEQUENCEFILE as select *
from facts520_normal_seq;*
*
*
This time, the output table got stored as a SEQUENCEFILE,

*hive> show create table test3facts520_gzip_seq;*
*OK*
*CREATE  TABLE test3facts520_gzip_seq(*
*  fact_key bigint,*
*  products_key int,*
*  retailers_key int,*
*  suppliers_key int,*
*  time_key int,*
*  units int)*
*ROW FORMAT SERDE*
*  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'*
*STORED AS INPUTFORMAT*
*  'org.apache.hadoop.mapred.SequenceFileInputFormat'*
*OUTPUTFORMAT*
*  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'*
*LOCATION*
*  'hdfs://
aana1.ird.com/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq'*
*TBLPROPERTIES (*
*  'numPartitions'='0',*
*  'numFiles'='5',*
*  'transient_lastDdlTime'='1370867777',*
*  'numRows'='0',*
*  'totalSize'='129811519',*
*  'rawDataSize'='0')*
*Time taken: 0.135 seconds*

But, the compression itself did not happen!

*[root@aana1 comp_data]# sudo -u hdfs hadoop fs -ls
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq*
*Found 5 items*
*-rw-r--r--   3 admin supergroup   40006368 2013-06-10 18:06
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000000_0*
*-rw-r--r--   3 admin supergroup   33026961 2013-06-10 18:06
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000001_0*
*-rw-r--r--   3 admin supergroup   21797242 2013-06-10 18:05
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000002_0*
*-rw-r--r--   3 admin supergroup   22171637 2013-06-10 18:05
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000003_0*
*-rw-r--r--   3 admin supergroup   12809311 2013-06-10 18:05
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000004_0*

Is there anything that I have done wrong, or I have missed something ?

Any help would be greatly appreciated!

Thank you,
Sachin
+
Stephen Sprague 2013-06-10, 18:15
+
Alexander Pivovarov 2013-06-10, 18:37
+
tofunmibabatunde@... 2013-06-10, 18:45