1. We use LZO compression in our MR jobs that create LZO files (these are NOT sequence files) that are the feeder files for Hive
2. Then we we use Hive data (LZO files) and run aggregation reports
Hope this helps
From: "Ravi Mummulla (BIG DATA)" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Monday, June 10, 2013 6:14 AM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: RE: Compression in Hive
Documentation is here https://cwiki.apache.org/confluence/display/Hive/CompressedStorage. Performance overhead is trivial for larger amounts of data but may be magnified as data size gets smaller. Typically where you gain is data transfers between nodes and disk reads/writes. Again, the larger the data size the more the gain.
From: Sachin Sudarshana [mailto:[EMAIL PROTECTED]]
Sent: Sunday, June 9, 2013 11:04 PM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Compression in Hive
I have been testing the usefulness of compression in Hive. I have a general question,
I would like to know if there are any particular cases where compression in hive can actually prove useful while running any MR jobs.
Any pointers/examples would really be useful!
=====================This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.