Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Compressed data storage in HDFS - Error

Copy link to this message
RE: Compressed data storage in HDFS - Error
There is something you gain and something you loose.
Compression would reduce IO through increased cpu work . Also you would receive different experience for different tasks ie HDFS read , HDFS write , shuffle and sort . So to go for compression or not depends on your usages .
Sent from my N8

-----Original Message-----
From: Sreenath Menon
Sent: 6/6/2012 8:50:23 AM
Subject: Compressed data storage in HDFS - Error
I would like to compress my data in the HDFS using some Hive commands.
Step followed: (data already residing in table sample)

create table rc_lzo like sample;
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
insert overwrite table rc_lzo select * from sample;

Compression codec com\.hadoop\.compression\.lzo\.LzoCodec was not found

1)What do I need to do to use Lzo as well as other compression methods?

2)Heard somewhere that :Using compressed data will produce better results than uncompressed data in some cases. How can this be, as there is always a compression and decompression time allotted with compression methods. Any truth in this, if so how ? Can understand how there are better results when using compression between mappers-to-reducers and in between map-reduce jobs.

Thanks and Regards
Sreenath Mullassery