Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Understanding compression in hdfs


Copy link to this message
-
Re: Understanding compression in hdfs
Yaron Gonen 2012-07-29, 17:35
Thanks!
I'll dig into those classes to figure out my next step.

Anyway, I just realized the block-level compression has nothing to do with
HDFS blocks. An HDFS block can contain an unknown number of compressed
blocks, which makes my efforts kind of worthless.

thanks again!

On Sun, Jul 29, 2012 at 6:40 PM, Tim Broberg <[EMAIL PROTECTED]> wrote:

>  What if you wrote a CompressionOutputStream class that wraps around the
> existing ones and outputs a hash per <n> bytes and a CompressionInputStream
> that checks them? ...and a Codec that wraps your compressors around
> arbitrary existing codecs.
>
>  Sounds like a bunch of work, and I'm not sure where you would store the
> hashes, but it would get the data into your clutches the instant it's
> available.
>
>     - Tim.
>
> On Jul 29, 2012, at 7:41 AM, "Yaron Gonen" <[EMAIL PROTECTED]> wrote:
>
>   Hi,
> I've created a SequeceFile.Writer with block-level compression.
> I'd like to create a SHA1 hash for each block written. How do I do that? I
> didn't see any way to take the compression under my control in order to
> know when a block is over.
>
>  Thanks,
> Yaron
>
>
> ------------------------------
> The information contained in this email is intended only for the personal
> and confidential use of the recipient(s) named above. The information and
> any attached documents contained in this message may be Exar confidential
> and/or legally privileged. If you are not the intended recipient, you are
> hereby notified that any review, use, dissemination or reproduction of this
> message is strictly prohibited and may be unlawful. If you have received
> this communication in error, please notify us immediately by return email
> and delete the original message.
>