Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Occasional regionserver crashes following socket errors writing to HDFS


Copy link to this message
-
Re: Occasional regionserver crashes following socket errors writing to HDFS
Michale I appreciate the feedback but I'd have to disagree.
In my case for example, I need to look at a complete set of data produced
by the map phase in order to make a decision and write it to Hbase. So sure
I could write all the mappers output to hbase then have another map only
job to scan the output of the previous one do the calculation then write
the output to another table. I don't really see why would that be better
than using a reducer.

As for the other tips, I agree the files are too large, so I increased the
file size, but I don't really see why is that relevant to the error we're
talking about. Why having many regions cause timeouts on HDFS?
I do have mslabs configured and GC tuneups.
I do run multiple reducers, I suspect that's aggravating the problem not
helping it.
As far as I can tell dfs.balance.bandwidthPerSec is relevant only for
balancing done with the balancer, not for the initial replication.
-eran

On Thu, May 10, 2012 at 9:59 PM, Michael Segel <[EMAIL PROTECTED]>wrote:

> Sigh.
>
> Dave,
> I really think you need to think more about the problem.
>
> Think about what a reduce does and then think about what happens in side
> of HBase.
>
> Then think about which runs faster... a job with two mappers writing the
> intermediate and final results in HBase,
> or a M/R job that writes its output to HBase.
>
> If you really truly think about the problem, you will start to understand
> why I say you really don't want to use a reducer when you're working w
> HBase.
>
>
> On May 10, 2012, at 1:41 PM, Dave Revell wrote:
>
> > Some examples of when you'd want a reducer:
> > http://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf
> >
> > On Thu, May 10, 2012 at 11:30 AM, Michael Segel
> > <[EMAIL PROTECTED]>wrote:
> >
> >> Dave, do you really want to go there?
> >>
> >> OP has a couple of issues and he was going down a rabbit hole.
> >> (You can choose if that's a reference to 'the Matrix, Jefferson
> Starship,
> >> Alice in Wonderland... or all of the above)
> >>
> >> So to put him on the correct path, I recommended the following, not in
> any
> >> order...
> >>
> >> 1) Increase his region size for this table only.
> >> 2) Look to decreasing the number of regions managed by a RS (which is
> why
> >> you increase region size)
> >> 3) Up the dfs.balance.bandwidthPerSec. (How often does HBase move
> regions
> >> and how exactly do they move regions ?)
> >> 4) Look at implementing MSLABS and GC tuning. This cuts down on the
> >> overhead.
> >> 5) Refactoring his job....
> >>
> >> Oops.
> >> Ok I didn't put that in the list.
> >> But that was the last thing I wrote as a separate statement.
> >> Clearly you didn't take my advice and think about the problem....
> >>
> >> To prove a point.... you wrote:
> >> 'Many mapreduce algorithms require a reduce phase (e.g. sorting)'
> >>
> >> Ok. So tell me why you would want to sort your input in to HBase and if
> >> that's really a good thing?
> >> Oops!... :-)
> >>
> >>
> >>
> >>
> >>
> >>
> >> On May 10, 2012, at 12:31 PM, Dave Revell wrote:
> >>> This "you don't need a reducer" conversation is distracting from the
> real
> >>> problem and is false.
> >>>
> >>> Many mapreduce algorithms require a reduce phase (e.g. sorting). The
> fact
> >>> that the output is written to HBase or somewhere else is irrelevant.
> >>>
> >>> -Dave
> >>>
> >>> On Thu, May 10, 2012 at 6:26 AM, Michael Segel <
> >> [EMAIL PROTECTED]>wrote:
> >>> [SNIP]
> >>
> >>
>
>