After some digging into the code it looks like this bug also affects bulk
load when using LoadIncrementalHFiles (bulk loading programmatically).
We  fixed the code in Compression.class (in Algorithm):

GZ("gz") {
            private transient GzipCodec codec;

            @Override
            DefaultCodec getCodec(Configuration conf) {
                if (codec == null) {
                    synchronized (this) {
                        if (codec == null) {
                            codec = new ReusableStreamGzipCodec(new
Configuration(conf));
                        }
                    }
                }
                return codec;
            }
        }

That way there is always configuration.

In addition, since we pre-create regions before bulk loading, we wanted the
MR job to relate only to these regions so by inheriting HFileOutputFormat
you can set only the split points that are relevant to this job and save a
lot of reduce time (especially if you have hundreds or thousands of
regions).
This works for us since each bulk load we do is relevant for a specific
timestamp. Hope it helps anyone...

Thanks.

On Wed, Nov 7, 2012 at 9:44 AM, Amit Sela <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB