Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Read access pattern

Copy link to this message
Re: Read access pattern
I see what you are saying Michael but I think following is a blanket
bq Think of it this way... the operation was a success but the patient
died. eq

This is not always the case. Yes, if your use-case/system is such that it
will have lots of users trying to access then perhaps N users kicking off N
concurrent/distributed reads is not efficient but what if you have a batch
use case where these distributed scans might actually help. Point being,
rather than shooting down the idea as a whole, we can perhaps qualify it
with areas where it might be useful and area others where it can have
adverse affect.

On Wed, May 1, 2013 at 10:14 AM, Michael Segel <[EMAIL PROTECTED]>wrote:

> Unfortunately as this idea keeps popping up, you are going to have this
> discussion.
> 1) As you admit... salting is bad when your primary access vector  is
> get()s.
> 2) Range scans. Instead of 1 range scan, you now have N where N is the
> number of salt values. In this case 10.
> You wouldn't think this as bad, however when you have a system which has a
> lot of users, lots of queries which now have to scan N times the number of
> records for each scan? Excessive overhead. Just because the scans happen in
> parallel, you are still tying up a finite amount of resources.
> So you have to go back and ask the initial question... why?
> Can you change your key?
> What is the problem you're trying to solve?
> The point is that just because you can do it, doesn't make it a good idea.
> Think of it this way... the operation was a success but the patient died.
> On May 1, 2013, at 12:12 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> > I do not want to be rude or anything... But how often we need to have
> this discussion?
> >
> > When you salt your rowkeys with say 10 salt values then for each read
> you need to fork of 10 read requests, and each of them touches only 1/10th
> of the tables (which nicely with HBase's prefix scans).
> >
> > Obviously, if you only need point gets you wouldn't salting, that would
> be stupid. If you mostly do range scans, than salting is quite nice.
> >
> > Saying that salting is bad, because it does not work for point gets is
> like saying that bulldozers are bad, because you cannot use on them race
> tracks. :)
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> > From: Michael Segel <[EMAIL PROTECTED]>
> > Sent: Tuesday, April 30, 2013 10:06 AM
> > Subject: Re: Read access pattern
> >
> >
> > Sure.
> >
> > By definition, the salt number is a random seed that is not associated
> with the underlying record.
> > A simple example is a round robin counter (mod the counter by 10
> yielding [0..9] )
> >
> > So you get a record, prepend your salt and you write it out to HBase.
> The salt will push the data out to a different region.
> >
> > But what happens when you want to read the data?
> >
> > So on a full table scan... no biggie, its the same.
> >
> > But suppose I want to do a partial table scan. Now I have to do multiple
> partial scans because I dont know the salt.
> > Or if I want to do a simple get() I now have to do N number of get()s
> where N is the number of salt values allowed. In my example that's 10.
> >
> > And that's the problem.
> >
> > You are better off doing a hash of the record, use the first couple of
> bytes off the hash and then writing the record out.
> > You want the record, take the key, hash it, using the same process and
> you have 1 get().
> >
> > You're still screwed up on doing a range scan, but you can't have
> everything.
> >
> is that they are talking about excess sodium chloride in your diet. I'm
> talking about using a salt aka 'random seed'.
> >
> > Does that make sense?
> >
> >
> > On Apr 30, 2013, at 11:17 AM, Shahab Yunus <[EMAIL PROTECTED]>
> wrote:
> >
> >> Well those are *some* words :) Anyway, can you explain a bit in detail