Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: splittable vs seekable compressed formats

Copy link to this message
RE: splittable vs seekable compressed formats
More specifically, seeking to a known location in the uncompressed data.  So not just seeking to “the nearest record boundary”, but seeking to “position 100000000 in the uncompressed data”.  I can see that if the writer kept track of this information on the side it would be available; my question is more about the standard formats (e.g. LZO compression in SequenceFile) supporting this without additional work.

From: Rahul Bhattacharjee [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 24, 2013 1:00 AM
Subject: Re: splittable vs seekable compressed formats

Yeah , I think John meant seeking to record boundaries.

On Fri, May 24, 2013 at 12:22 PM, Harsh J <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
SequenceFiles should be seekable provided you know/manage their sync
points during writes I think. With LZO this may be non-trivial.

On Thu, May 23, 2013 at 11:01 PM, John Lilley <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
> I’ve read about splittable compressed formats in Hadoop.  Are any of these
> formats also “seekable” (in other words, be able to seek to an absolute
> location in the uncompressed data).
> John
Harsh J