Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Occasional regionserver crashes following socket errors writing to HDFS


Copy link to this message
-
Re: Occasional regionserver crashes following socket errors writing to HDFS
Stack,

That section was written by Doug after he and I had the same debate man moons ago.
While I can't say with absolute certainty that you shouldn't use a reducer, I can say is that every situation where I have seen a M/R where you are writing to HBase, you end up not wanting to use a reducer. If you want a clear and concise statement you can say that the rule of thumb is that you don't want to use a reducer and that cases where you would need to first use a reducer are the rare exception.

The reason I ask people to think about this topic is that unless you have a really good foundation in databases, not relying on a reducer is a bit counter intuitive. (Which is why I said that you really need to clear your mind and focus on this issue. )

-Mike

PS. If you care to read the thread, I didn't become condescending until a certain individual piped up about how refactoring the M/R was a 'distraction' to the issue at hand.
Not to mention his flip response w the Google paper?

On May 10, 2012, at 4:57 PM, Stack wrote:

> On Thu, May 10, 2012 at 11:59 AM, Michael Segel
> <[EMAIL PROTECTED]> wrote:
>> Sigh.
>>
>> Dave,
>> I really think you need to think more about the problem.
>>
>> Think about what a reduce does and then think about what happens in side of HBase.
>>
>> Then think about which runs faster... a job with two mappers writing the intermediate and final results in HBase,
>> or a M/R job that writes its output to HBase.
>>
>> If you really truly think about the problem, you will start to understand why I say you really don't want to use a reducer when you're working w HBase.
>>
>
> We have a bit of doc that usually you might want to forego reduce
> phase, http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#sink.
> Do we need to add to it?  That said, you can't make an hard and fast
> rule that the reduce is to be avoided absolutely.  There will be cases
> where it makes sense (MR sort orthogonal to HBase's or a fat
> aggregating reduce, etc.)
>
> St.Ack
> P.S. Hey Michael.  Go easy on the 'sighs'.  The participants in this
> thread have a clue.  I can testify to that.  Also, I know you don't
> mean it, but on occasion, both in this thread and in others I've seen
> you on, your tone can come across as condescending (and there is
> nothing like condescension for raising the rankles).  We all have our
> style's but you might want to review with this in mind before you hit
> send the next time.  Just a suggestion.
>