Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Occasional regionserver crashes following socket errors writing to HDFS


Copy link to this message
-
Re: Occasional regionserver crashes following socket errors writing to HDFS
Sigh.

Dave,
I really think you need to think more about the problem.

Think about what a reduce does and then think about what happens in side of HBase.

Then think about which runs faster... a job with two mappers writing the intermediate and final results in HBase,
or a M/R job that writes its output to HBase.

If you really truly think about the problem, you will start to understand why I say you really don't want to use a reducer when you're working w HBase.
On May 10, 2012, at 1:41 PM, Dave Revell wrote:

> Some examples of when you'd want a reducer:
> http://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf
>
> On Thu, May 10, 2012 at 11:30 AM, Michael Segel
> <[EMAIL PROTECTED]>wrote:
>
>> Dave, do you really want to go there?
>>
>> OP has a couple of issues and he was going down a rabbit hole.
>> (You can choose if that's a reference to 'the Matrix, Jefferson Starship,
>> Alice in Wonderland... or all of the above)
>>
>> So to put him on the correct path, I recommended the following, not in any
>> order...
>>
>> 1) Increase his region size for this table only.
>> 2) Look to decreasing the number of regions managed by a RS (which is why
>> you increase region size)
>> 3) Up the dfs.balance.bandwidthPerSec. (How often does HBase move regions
>> and how exactly do they move regions ?)
>> 4) Look at implementing MSLABS and GC tuning. This cuts down on the
>> overhead.
>> 5) Refactoring his job....
>>
>> Oops.
>> Ok I didn't put that in the list.
>> But that was the last thing I wrote as a separate statement.
>> Clearly you didn't take my advice and think about the problem....
>>
>> To prove a point.... you wrote:
>> 'Many mapreduce algorithms require a reduce phase (e.g. sorting)'
>>
>> Ok. So tell me why you would want to sort your input in to HBase and if
>> that's really a good thing?
>> Oops!... :-)
>>
>>
>>
>>
>>
>>
>> On May 10, 2012, at 12:31 PM, Dave Revell wrote:
>>> This "you don't need a reducer" conversation is distracting from the real
>>> problem and is false.
>>>
>>> Many mapreduce algorithms require a reduce phase (e.g. sorting). The fact
>>> that the output is written to HBase or somewhere else is irrelevant.
>>>
>>> -Dave
>>>
>>> On Thu, May 10, 2012 at 6:26 AM, Michael Segel <
>> [EMAIL PROTECTED]>wrote:
>>> [SNIP]
>>
>>