Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Hive and Lzo Compression


+
w00t w00t 2013-08-08, 09:02
+
Edward Capriolo 2013-08-08, 18:06
+
Sanjay Subramanian 2013-08-08, 19:30
+
Lefty Leverenz 2013-08-10, 17:06
+
w00t w00t 2013-08-13, 07:13
+
Sanjay Subramanian 2013-08-14, 01:44
+
w00t w00t 2013-08-14, 08:15
+
Sanjay Subramanian 2013-08-14, 17:41
+
Nitin Pawar 2013-08-14, 17:54
Copy link to this message
-
Re: Hive and Lzo Compression
I am not sure if in this cade data is loaded
OR partition  added with location specified (to some location in HDFS)

Yes u r stating the question correctly

sanjay

From: Nitin Pawar <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Wednesday, August 14, 2013 10:54 AM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Re: Hive and Lzo Compression

Please correct me if I understood the question correctly

You created a table def without mentioning a stored as clause
then you load data into table from a compressed a file
then do a select query and it still works
but how did it figured out which compression codec to use?

Am I stating it correctly ?

On Wed, Aug 14, 2013 at 11:11 PM, Sanjay Subramanian <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
That is really interesting…let me try and think of a reason…meanwhile any other LZO Hive Samurais out there ? Please help with some guidance

sanjay

From: w00t w00t <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>, w00t w00t <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Wednesday, August 14, 2013 1:15 AM

To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Re: Hive and Lzo Compression
Thanks for your reply.

The interesting thing I experience is that the SELECT query still works - even when I do not specify the STORED AS clause... that puzzles me a bit.

________________________________
Von: Sanjay Subramanian <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
An: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>; w00t w00t <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Gesendet: 3:44 Mittwoch, 14.August 2013
Betreff: Re: Hive and Lzo Compression

Hi

I think the CREATE TABLE without the STORED AS clause will not give any errors while creating the table.
However when you query that table and since that table contains .lzo files , you would  get errors.
With external tables , u r separating the table creation(definition) from the data. So only at the time of query of that table, hive might report errors.

LZO compression rocks ! I am so glad I used it in our projects here.

Regards

sanjay

From: w00t w00t <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>, w00t w00t <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Tuesday, August 13, 2013 12:13 AM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Re: Hive and Lzo Compression

Thanks for your replies and the link.

I could get it working, but wondered why the CREATE TABLE statement worked without the STORED AS Clause as well...that's what puzzles me a bit...

But I will use the STORED AS Clause to be on the safe side.
________________________________
Von: Lefty Leverenz <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
An: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
CC: w00t w00t <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Gesendet: 19:06 Samstag, 10.August 2013
Betreff: Re: Hive and Lzo Compression

I'm not seeing any documentation link in Sanjay's message, so here it is again (in the Hive wiki's language manual):  https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO.
On Thu, Aug 8, 2013 at 3:30 PM, Sanjay Subramanian <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Please refer this documentation here
Let me know if u need more clarifications so that we can make this document better and complete

Thanks

sanjay

From: w00t w00t <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>, w00t w00t <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Thursday, August 8, 2013 2:02 AM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Hive and Lzo Compression
Hello,

I am started to run Hive with Lzo compression on Hortonworks 1.2

I have managed to install/configure Lzo and  hive -e "set io.compression.codecs" shows me the Lzo Codecs:
io.compression.codecsorg.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec,
org.apache.hadoop.io.compress.BZip2Codec

However, I have some questions where I would be happy if you could help me.

(1) CREATE TABLE statement

I read in different postings, that in the CREATE TABLE statement, I have to use the following STORAGE clause:

CREATE EXTERNAL TABLE txt_table_lzo (
   txt_line STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||'
STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/user/myuser/data/in/lzo_compressed';

It works withouth any problems now to execute SELECT statements on this table with Lzo data.

However I also created a table on the same data without this STORAGE clause:

CREATE EXTERNAL TABLE txt_table_lzo_tst (
   txt_line STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||'
LOCATION '/user/myuser/data/in/lzo_compressed';

The interesting thing is, it works as well, when I execute a SELECT statement and this table.

Can you help, why the second CREATE TABLE statement works as well?
What should I use in DDLs?
Is it best practice to use the STORED AS clause with a "deprecatedLzoTextInputFormat"? Or should I remove it?
(2) Output and Intermediate Compress
+
Nitin Pawar 2013-08-17, 14:40
+
w00t w00t 2013-08-19, 08:06
+
Nitin Pawar 2013-08-19, 08:27
+
Sanjay Subramanian 2013-08-10, 20:00