Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Hive and Lzo Compression


+
w00t w00t 2013-08-08, 09:02
+
Edward Capriolo 2013-08-08, 18:06
+
Sanjay Subramanian 2013-08-08, 19:30
+
Lefty Leverenz 2013-08-10, 17:06
+
w00t w00t 2013-08-13, 07:13
+
Sanjay Subramanian 2013-08-14, 01:44
+
w00t w00t 2013-08-14, 08:15
+
Sanjay Subramanian 2013-08-14, 17:41
+
Nitin Pawar 2013-08-14, 17:54
+
Sanjay Subramanian 2013-08-14, 22:50
+
Nitin Pawar 2013-08-17, 14:40
+
w00t w00t 2013-08-19, 08:06
+
Nitin Pawar 2013-08-19, 08:27
Copy link to this message
-
Re: Hive and Lzo Compression
Sanjay Subramanian 2013-08-10, 20:00
Thanks Lefty.

Sent from my iPhone

On Aug 10, 2013, at 10:08 AM, "Lefty Leverenz" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

I'm not seeing any documentation link in Sanjay's message, so here it is again (in the Hive wiki's language manual):  https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO.
On Thu, Aug 8, 2013 at 3:30 PM, Sanjay Subramanian <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Please refer this documentation here
Let me know if u need more clarifications so that we can make this document better and complete

Thanks

sanjay

From: w00t w00t <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>, w00t w00t <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Thursday, August 8, 2013 2:02 AM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Hive and Lzo Compression
Hello,

I am started to run Hive with Lzo compression on Hortonworks 1.2

I have managed to install/configure Lzo and  hive -e "set io.compression.codecs" shows me the Lzo Codecs:
io.compression.codecsorg.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec,
org.apache.hadoop.io.compress.BZip2Codec

However, I have some questions where I would be happy if you could help me.

(1) CREATE TABLE statement

I read in different postings, that in the CREATE TABLE statement, I have to use the following STORAGE clause:

CREATE EXTERNAL TABLE txt_table_lzo (
   txt_line STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||'
STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/user/myuser/data/in/lzo_compressed';

It works withouth any problems now to execute SELECT statements on this table with Lzo data.

However I also created a table on the same data without this STORAGE clause:

CREATE EXTERNAL TABLE txt_table_lzo_tst (
   txt_line STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||'
LOCATION '/user/myuser/data/in/lzo_compressed';

The interesting thing is, it works as well, when I execute a SELECT statement and this table.

Can you help, why the second CREATE TABLE statement works as well?
What should I use in DDLs?
Is it best practice to use the STORED AS clause with a "deprecatedLzoTextInputFormat"? Or should I remove it?
(2) Output and Intermediate Compression Settings

I want to use output compression .

In "Programming Hive" from Capriolo, Wampler, Rutherglen the following commands are recommended:
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;

          However, in some other places in forums, I found the following recommended settings:
SET hive.exec.compress.output=true
SET mapreduce.output.fileoutputformat.compress=true
SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec

Am I right, that the first settings are for Hadoop versions prior 0.23?
Or is there any other reason why the settings are different?

I am using Hadoop 1.1.2 with Hive 0.10.0.
Which settings would you recommend to use?

--------------
          I also want to compress intermediate results.

         Again, in  "Programming Hive" the following settings are recommended:
         SET hive.exec.compress.intermediate=true;
         SET mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;

          Is this the right setting?

          Or should I again use the settings (which look more valid for Hadoop 0.23 and greater)?:
          SET hive.exec.compress.intermediate=true;
          SET mapreduce.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;

Thanks
CONFIDENTIALITY NOTICE
=====================This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
CONFIDENTIALITY NOTICE
=====================This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.