Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: splittable vs seekable compressed formats


Copy link to this message
-
RE: splittable vs seekable compressed formats
More specifically, seeking to a known location in the uncompressed data.  So not just seeking to “the nearest record boundary”, but seeking to “position 100000000 in the uncompressed data”.  I can see that if the writer kept track of this information on the side it would be available; my question is more about the standard formats (e.g. LZO compression in SequenceFile) supporting this without additional work.
John

From: Rahul Bhattacharjee [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 24, 2013 1:00 AM
To: [EMAIL PROTECTED]
Subject: Re: splittable vs seekable compressed formats

Yeah , I think John meant seeking to record boundaries.
Thanks,
Rahul

On Fri, May 24, 2013 at 12:22 PM, Harsh J <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
SequenceFiles should be seekable provided you know/manage their sync
points during writes I think. With LZO this may be non-trivial.

On Thu, May 23, 2013 at 11:01 PM, John Lilley <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
> I’ve read about splittable compressed formats in Hadoop.  Are any of these
> formats also “seekable” (in other words, be able to seek to an absolute
> location in the uncompressed data).
>
> John
>
>
--
Harsh J

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB