Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Hive and Lzo Compression


Copy link to this message
-
Re: Hive and Lzo Compression
Edward Capriolo 2013-08-08, 18:06
You should add CREATE TABLE () STORED AS SEQUENCEFILE. You probably do not
want LZO files, you probably want sequence files with LZO block compression.
On Thu, Aug 8, 2013 at 5:02 AM, w00t w00t <[EMAIL PROTECTED]> wrote:

>
> Hello,
>
> I am started to run Hive with Lzo compression on Hortonworks 1.2
>
> I have managed to install/configure Lzo and  hive -e "set
> io.compression.codecs" shows me the Lzo Codecs:
> io.compression.codecs> org.apache.hadoop.io.compress.GzipCodec,
> org.apache.hadoop.io.compress.DefaultCodec,
> com.hadoop.compression.lzo.LzoCodec,
> com.hadoop.compression.lzo.LzopCodec,
> org.apache.hadoop.io.compress.BZip2Codec
>
> However, I have some questions where I would be happy if you could help me.
>
> (1) CREATE TABLE statement
>
> I read in different postings, that in the CREATE TABLE statement, I have
> to use the following STORAGE clause:
>
> CREATE EXTERNAL TABLE txt_table_lzo (
>    txt_line STRING
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||'
> STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION '/user/myuser/data/in/lzo_compressed';
>
> It works withouth any problems now to execute SELECT statements on this
> table with Lzo data.
>
> However I also created a table on the same data without this STORAGE
> clause:
>
> CREATE EXTERNAL TABLE txt_table_lzo_tst (
>    txt_line STRING
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||'
> LOCATION '/user/myuser/data/in/lzo_compressed';
>
> The interesting thing is, it works as well, when I execute a SELECT
> statement and this table.
>
> Can you help, why the second CREATE TABLE statement works as well?
> What should I use in DDLs?
> Is it best practice to use the STORED AS clause with a
> "deprecatedLzoTextInputFormat"? Or should I remove it?
>
>
> (2) Output and Intermediate Compression Settings
>
> I want to use output compression .
>
> In "Programming Hive" from Capriolo, Wampler, Rutherglen the following
> commands are recommended:
> SET hive.exec.compress.output=true;
> SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
>
>           However, in some other places in forums, I found the following
> recommended settings:
> SET hive.exec.compress.output=true
> SET mapreduce.output.fileoutputformat.compress=true
> SET
> mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec
>
> Am I right, that the first settings are for Hadoop versions prior 0.23?
> Or is there any other reason why the settings are different?
>
> I am using Hadoop 1.1.2 with Hive 0.10.0.
> Which settings would you recommend to use?
>
> --------------
>           I also want to compress intermediate results.
>
>          Again, in  "Programming Hive" the following settings are
> recommended:
>          SET hive.exec.compress.intermediate=true;
>          SET
> mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
>
>           Is this the right setting?
>
>           Or should I again use the settings (which look more valid for
> Hadoop 0.23 and greater)?:
>           SET hive.exec.compress.intermediate=true;
>           SET
> mapreduce.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
>
> Thanks
>
>
>
>