-Re: reducer throttling?
Dexin Wang 2011-03-17, 21:00
Can you describe a bit more about your bulk insert technique? And the way
you control the number of reducers is also by adding artificial ORDER or
On Thu, Mar 17, 2011 at 1:33 PM, Alex Rovner <[EMAIL PROTECTED]> wrote:
> We use bulk insert technique after the job completes. You can control the
> amount of each bulk insert by controlling the amount of reducers.
> Sent from my iPhone
> On Mar 17, 2011, at 2:03 PM, Dexin Wang <[EMAIL PROTECTED]> wrote:
> > We do some processing in hadoop then as the last step, we write the
> > to database. Database is not good at handling hundreds of concurrent
> > connections and fast writes. So we need to throttle down the number of
> > that writes to DB. Since we have no control on the number of mappers, we
> > an artificial reducer step to achieve that, either by doing GROUP or
> > like this:
> > sorted_data = ORDER data BY f1 PARALLEL 10;
> > -- then write sorted_data to DB
> > or
> > grouped_data = GROUP data BY f1 PARALLEL 10;
> > data_to_write = FOREACH grouped_data GENERATE $1;
> > I feel neither is good approach. They just add unnecessary computing
> > especially the first one. And GROUP may result in too large of bags
> > Any better suggestions?