Another option to avoid the timeout/oome issues is to use scan.setBatch() so that the scanner would function normally for small rows but would break up large rows in multiple Result objects which you can now use in conjunction with scan.setCaching() to control how much data you get back..
This approach would not need a change in your schema design and would ensure that only 1 mapper processes the entire row (but in multiple calls to the map function)
On Sun 6 Jan, 2013 10:07 PM IST David Koch wrote:
>Thank you for your response. I will take a look.
>With regards to the timeouts: I think changing the key design as outlined
>above would ameliorate the situation since each map call only requests a
>small amount of data as opposed to what could be a large chunk. I remember
>that simply doing a get on one of the large outlier rows (~500mb) brought
>down the region server involved.
>On Sun, Jan 6, 2013 at 5:11 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>> If events for one user are processed by a single mapper, I think you would