Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Controlling TableMapReduceUtil table split points


+
David Koch 2013-01-06, 12:37
+
Ted Yu 2013-01-06, 16:11
+
David Koch 2013-01-06, 16:37
+
Ted Yu 2013-01-06, 16:47
+
Dhaval Shah 2013-01-06, 17:29
Copy link to this message
-
Re: Controlling TableMapReduceUtil table split points
David Koch 2013-01-06, 17:53
Hi Dhaval,

Good call on the setBatch. I had forgotten about it. Just like changing the
schema it would involve changing the map(...) to reflect the fact that only
part of the user's data is returned in each call but I would not have to
manipulate table splits.

The HBase book does suggest that it's bad practice to use the "logical"
schema of lumping all user data into a single row(*) but I'll do some
testing to see what works.

Thank you,

/David

(*) Chapter 9, section "Tall-Narrow Versus Flat-Wide Tables", 3rd ed., page
359)
On Sun, Jan 6, 2013 at 6:29 PM, Dhaval Shah <[EMAIL PROTECTED]>wrote:

> Another option to avoid the timeout/oome issues is to use scan.setBatch()
> so that the scanner would function normally for small rows but would break
> up large rows in multiple Result objects which you can now use in
> conjunction with scan.setCaching() to control how much data you get back..
>
> This approach would not need a change in your schema design and would
> ensure that only 1 mapper processes the entire row (but in multiple calls
> to the map function)
>
+
Dhaval Shah 2013-01-22, 23:10