Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Occasional regionserver crashes following socket errors writing to HDFS


Copy link to this message
-
Re: Occasional regionserver crashes following socket errors writing to HDFS
Stack,

That section was written by Doug after he and I had the same debate man moons ago.
While I can't say with absolute certainty that you shouldn't use a reducer, I can say is that every situation where I have seen a M/R where you are writing to HBase, you end up not wanting to use a reducer. If you want a clear and concise statement you can say that the rule of thumb is that you don't want to use a reducer and that cases where you would need to first use a reducer are the rare exception.

The reason I ask people to think about this topic is that unless you have a really good foundation in databases, not relying on a reducer is a bit counter intuitive. (Which is why I said that you really need to clear your mind and focus on this issue. )

-Mike

PS. If you care to read the thread, I didn't become condescending until a certain individual piped up about how refactoring the M/R was a 'distraction' to the issue at hand.
Not to mention his flip response w the Google paper?

On May 10, 2012, at 4:57 PM, Stack wrote:

> On Thu, May 10, 2012 at 11:59 AM, Michael Segel
> <[EMAIL PROTECTED]> wrote:
>> Sigh.
>>
>> Dave,
>> I really think you need to think more about the problem.
>>
>> Think about what a reduce does and then think about what happens in side of HBase.
>>
>> Then think about which runs faster... a job with two mappers writing the intermediate and final results in HBase,
>> or a M/R job that writes its output to HBase.
>>
>> If you really truly think about the problem, you will start to understand why I say you really don't want to use a reducer when you're working w HBase.
>>
>
> We have a bit of doc that usually you might want to forego reduce
> phase, http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#sink.
> Do we need to add to it?  That said, you can't make an hard and fast
> rule that the reduce is to be avoided absolutely.  There will be cases
> where it makes sense (MR sort orthogonal to HBase's or a fat
> aggregating reduce, etc.)
>
> St.Ack
> P.S. Hey Michael.  Go easy on the 'sighs'.  The participants in this
> thread have a clue.  I can testify to that.  Also, I know you don't
> mean it, but on occasion, both in this thread and in others I've seen
> you on, your tone can come across as condescending (and there is
> nothing like condescension for raising the rankles).  We all have our
> style's but you might want to review with this in mind before you hit
> send the next time.  Just a suggestion.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB