Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Bulk load fails with NullPointerException


Copy link to this message
-
Re: Bulk load fails with NullPointerException
After some digging into the code it looks like this bug also affects bulk
load when using LoadIncrementalHFiles (bulk loading programmatically).
We  fixed the code in Compression.class (in Algorithm):

GZ("gz") {
            private transient GzipCodec codec;

            @Override
            DefaultCodec getCodec(Configuration conf) {
                if (codec == null) {
                    synchronized (this) {
                        if (codec == null) {
                            codec = new ReusableStreamGzipCodec(new
Configuration(conf));
                        }
                    }
                }
                return codec;
            }
        }

That way there is always configuration.

In addition, since we pre-create regions before bulk loading, we wanted the
MR job to relate only to these regions so by inheriting HFileOutputFormat
you can set only the split points that are relevant to this job and save a
lot of reduce time (especially if you have hundreds or thousands of
regions).
This works for us since each bulk load we do is relevant for a specific
timestamp. Hope it helps anyone...

Thanks.

On Wed, Nov 7, 2012 at 9:44 AM, Amit Sela <[EMAIL PROTECTED]> wrote:

> Does this bug affect snappy as well ? maybe I'll just use it instead of GZ
> (also recommended in the book).
>
>
> On Tue, Nov 6, 2012 at 10:27 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
>> I'm not talking about the major compation, but about the CF compaction.
>>
>> What's your table definition? Do you have the compaction (GZ) defined
>> there?
>>
>> It seems there is some failure with this based on the stack trace.
>>
>> So if you disable it while you are doing your load, you should not
>> face this again. Then you can alter your CF to re-activate it?
>>
>> 2012/11/6, Amit Sela <[EMAIL PROTECTED]>:
>> > Do you mean setting: hbase.hregion.majorcompaction to 0 ?
>> > Because it's already set this way. We pre-create new regions before
>> writing
>> > to HBase and initiate a major compaction once a day.
>> >
>> > On Tue, Nov 6, 2012 at 8:51 PM, Jean-Marc Spaggiari
>> > <[EMAIL PROTECTED]
>> >> wrote:
>> >
>> >> Maybe one option will be to disable the compaction, load the data,
>> >> re-activate the compaction, major-compact the data?
>> >>
>> >> 2012/11/6, Amit Sela <[EMAIL PROTECTED]>:
>> >> > Seems like that's the one alright... Any ideas how to avoid it ?
>> maybe
>> >> > a
>> >> > patch ?
>> >> >
>> >> > On Tue, Nov 6, 2012 at 8:05 PM, Jean-Daniel Cryans
>> >> > <[EMAIL PROTECTED]>wrote:
>> >> >
>> >> >> This sounds a lot like
>> >> >> https://issues.apache.org/jira/browse/HBASE-5458
>> >> >>
>> >> >> On Tue, Nov 6, 2012 at 2:28 AM, Amit Sela <[EMAIL PROTECTED]>
>> wrote:
>> >> >> > Hi all,
>> >> >> >
>> >> >> > I'm trying to bulk load using LoadIncrementalHFiles and I get a
>> >> >> > NullPointerException
>> >> >> > at:
>> >> >>
>> >>
>> org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63).
>> >> >> >
>> >> >> > It looks like DefaultCodec has no set configuration...
>> >> >> >
>> >> >> > Anyone encounter this before ?
>> >> >> >
>> >> >> > Thanks.
>> >> >> >
>> >> >> >>>>>>>>Full exception thrown:
>> >> >> >
>> >> >> > java.util.concurrent.ExecutionException:
>> >> java.lang.NullPointerException
>> >> >> > at
>> >> >> > java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>> >> >> > at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>> >> >> > at
>> >> >> >
>> >> >>
>> >>
>> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplitPhase(LoadIncrementalHFiles.java:333)
>> >> >> > at
>> >> >> >
>> >> >>
>> >>
>> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:232)
>> >> >> > at
>> >> >> >
>> >> >>
>> >>
>> com.infolinks.hadoop.jobrunner.UrlsHadoopJobExecutor.executeURLJob(UrlsHadoopJobExecutor.java:204)
>> >> >> > at
>> >> >> >
>> >> >>
>> >>
>> com.infolinks.hadoop.jobrunner.UrlsHadoopJobExecutor.runJobIgnoreSystemJournal(UrlsHadoopJobExecutor.java:86)