Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Bulk load moving HFiles to the wrong region


Copy link to this message
-
Re: Bulk load moving HFiles to the wrong region
Indeed there are more than 2 split points, there are 4 split points for 5
new regions added each day.
the new data bulk loaded each day belongs to he new regions.
It seems like the partitions read are from the previous insertion, and if
that is the case, the comparator will surely indicate that the data loaded
belongs in the previous (pre split)  last region. Where does the
RegionServer save the partitions file written to DistributedCache ?
On Tue, Dec 17, 2013 at 1:18 PM, Bijieshan <[EMAIL PROTECTED]> wrote:

> >> >>>> The previous last region is not supposed to delete I'm just
> >> >>>> adding new regions (always following lexicographically) so that
> >> >>>> the last region before the pre-split is not the last anymore.
>
> You mean you added the new regions into META? Sorry if I misunderstood you
> here. But can you tell me how did you run the split for each new day? It
> seems there may have more than 2 split points.
>
> Thanks.
> Jieshan
>
> -----Original Message-----
> From: Amit Sela [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, December 17, 2013 6:10 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Bulk load moving HFiles to the wrong region
>
> Region server logs in region servers that were supposed to get the loaded
> data show that they get request to open the (correct) region, and they open
> it.
> But only in the region server where the data is actually loaded in to have
> the move in the log, for all file..
> The log actually shows it copies to the wrong directory.
> Could it have something to do with the fact that the same RegionServer
> that hosts the "wrong" region also hosts some of the regions loading ?
>
>
> On Tue, Dec 17, 2013 at 11:39 AM, Amit Sela <[EMAIL PROTECTED]> wrote:
>
> > Like I mentioned before, running with all reducers works fine. Running
> > with the extension of HFileOutputFormat fails, sometimes, on some tables.
> > .META. encoded qualifier points to different directories for the
> > different regions files are supposedly loaded into. The directories
> > actually do exist, and they contain all relevant family directories,
> > but the directories are EMPTY.
> > Instead, the files that should have been in all different directories
> > are moved to the corresponding family directories under directory
> > pointed by .META. encoded qualifier of the last region before the
> > split (which is where it would fit if non pre-splitting occurred).
> >
> >
> > On Tue, Dec 17, 2013 at 4:48 AM, Bijieshan <[EMAIL PROTECTED]> wrote:
> >
> >> >>>> In the first step, the files are read correctly and regionGroups
> >> >>>> is creates as it should.
> >> Did you notice the reducer numbers? Did it equal to 2000(Before your
> >> extended HFileOutputFormat)?
> >>
> >> >>> RegionServer logs in the RegionServer that the files are moved to
> >> >>> indeed shows that all files are moved to that region (when it
> >> >>> doesn't happen it shows only 1 file per family moved to a
> >> >>> RegionServer)
> >>
> >> How about the region-split related logs?
> >>
> >> > Loaded regions are listed in .META. table and the ENCODED field in
> >> > the table points to an existing directory. But all family
> >> > directories in this region are empty...
> >>
> >> Was the previous old region still in .META.?
> >>
> >> > I implemented an extension of HFileOutputFormat - because each bulk
> >> load will import data to the newly created regions only, I pass the
> >> prefix
> >> > (yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so
> >> > that
> >> getRegionStartKeys returns only the corresponding keys.
> >> >I did this in order to avoid having 2000 reducers when my target is
> >> >15
> >> regions...
> >>
> >> We always do like this:). Only configure the necessary regions.
> >>
> >> Sorry for the lately reply.
> >>
> >> Jieshan
> >> -----Original Message-----
> >> From: Amit Sela [mailto:[EMAIL PROTECTED]]
> >> Sent: Tuesday, December 17, 2013 12:19 AM
> >> To: [EMAIL PROTECTED]
> >> Subject: Re: Bulk load moving HFiles to the wrong region