Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Compressed data storage in HDFS - Error


Copy link to this message
-
RE: Compressed data storage in HDFS - Error
There is something you gain and something you loose.
Compression would reduce IO through increased cpu work . Also you would receive different experience for different tasks ie HDFS read , HDFS write , shuffle and sort . So to go for compression or not depends on your usages .
Sent from my N8

-----Original Message-----
From: Sreenath Menon
Sent: 6/6/2012 8:50:23 AM
To: [EMAIL PROTECTED]
Subject: Compressed data storage in HDFS - Error
I would like to compress my data in the HDFS using some Hive commands.
Step followed: (data already residing in table sample)

create table rc_lzo like sample;
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
insert overwrite table rc_lzo select * from sample;

Error:
Compression codec com\.hadoop\.compression\.lzo\.LzoCodec was not found

1)What do I need to do to use Lzo as well as other compression methods?

2)Heard somewhere that :Using compressed data will produce better results than uncompressed data in some cases. How can this be, as there is always a compression and decompression time allotted with compression methods. Any truth in this, if so how ? Can understand how there are better results when using compression between mappers-to-reducers and in between map-reduce jobs.

Thanks and Regards
Sreenath Mullassery
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB