Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Controlling TableMapReduceUtil table split points


Copy link to this message
-
Re: Controlling TableMapReduceUtil table split points

Another option to avoid the timeout/oome issues is to use scan.setBatch() so that the scanner would function normally for small rows but would break up large rows in multiple Result objects which you can now use in conjunction with scan.setCaching() to control how much data you get back..

This approach would not need a change in your schema design and would ensure that only 1 mapper processes the entire row (but in multiple calls to the map function)

------------------------------
On Sun 6 Jan, 2013 10:07 PM IST David Koch wrote:

>Hi Ted,
>
>Thank you for your response. I will take a look.
>
>With regards to the timeouts: I think changing the key design as outlined
>above would ameliorate the situation since each map call only requests a
>small amount of data as opposed to what could be a large chunk. I remember
>that simply doing a get on one of the large outlier rows (~500mb) brought
>down the region server involved.
>
>/David
>
>On Sun, Jan 6, 2013 at 5:11 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
>> If events for one user are processed by a single mapper, I think you would
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB