-Uncompressed size of Sequence files
Robert Dyer 2013-11-23, 21:14
Is there an easy way to get the uncompressed size of a sequence file that
is block compressed? I am using the Snappy compressor.
I realize I can obviously just decompress them to temporary files to get
the size, but I would assume there is an easier way. Perhaps an existing
tool that my search did not turn up?
If not, I will have to run a MR job load each compressed block and read the
Snappy header to get the size. I need to do this for a large number of
files so I'd prefer a simple CLI tool (sort of like 'hadoop fs -du').