Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Uncompressed size of Sequence files

Copy link to this message
Uncompressed size of Sequence files
Is there an easy way to get the uncompressed size of a sequence file that
is block compressed?  I am using the Snappy compressor.

I realize I can obviously just decompress them to temporary files to get
the size, but I would assume there is an easier way.  Perhaps an existing
tool that my search did not turn up?

If not, I will have to run a MR job load each compressed block and read the
Snappy header to get the size.  I need to do this for a large number of
files so I'd prefer a simple CLI tool (sort of like 'hadoop fs -du').

- Robert