There is a facility in Hadoop to compress "intermediate mapoutput and job
output". Is your question related to reading compressed files itself into
If so, refer SequenceFileInputFormat. (
the *SequenceFileInputFormat* reads special binary files that are specific
to Hadoop. These files include many features designed to allow data to be
rapidly read into Hadoop mappers. Sequence files are block-compressed and
provide direct serialization and deserialization of several arbitrary data
types (not just text). Sequence files can be generated as the output of
other MapReduce tasks and are an efficient intermediate representation for
data that is passing from one MapReduce job to anther.
On Sat, Apr 3, 2010 at 11:15 PM, u235sentinel <[EMAIL PROTECTED]>wrote:
> I'm starting to evaluate Hadoop. We are currently running Sensage and
> store a lot of log files in our current environment. I've been looking at
> the Hadoop forums and googling (of course) but haven't learned if Hadoop
> HDFS does any compression to files we store.
> On the average we're storing about 600 gigs a week in log files (more or
> less). Generally we need to store about 1 1/2 - 2 years of logs. With
> Sensage compression we can store about 200+ Tb of logs in our current
> As I said, we're starting to evaluate if Hadoop would be a good replacement
> to our Sensage environment (or at least augment it).
> Thanks a bunch!!