Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Read access pattern


Copy link to this message
-
Re: Read access pattern
I see what you are saying Michael but I think following is a blanket
assumption:
bq Think of it this way... the operation was a success but the patient
died. eq

This is not always the case. Yes, if your use-case/system is such that it
will have lots of users trying to access then perhaps N users kicking off N
concurrent/distributed reads is not efficient but what if you have a batch
use case where these distributed scans might actually help. Point being,
rather than shooting down the idea as a whole, we can perhaps qualify it
with areas where it might be useful and area others where it can have
adverse affect.

Regards,
Shahab
On Wed, May 1, 2013 at 10:14 AM, Michael Segel <[EMAIL PROTECTED]>wrote:

> Unfortunately as this idea keeps popping up, you are going to have this
> discussion.
>
> 1) As you admit... salting is bad when your primary access vector  is
> get()s.
> 2) Range scans. Instead of 1 range scan, you now have N where N is the
> number of salt values. In this case 10.
> You wouldn't think this as bad, however when you have a system which has a
> lot of users, lots of queries which now have to scan N times the number of
> records for each scan? Excessive overhead. Just because the scans happen in
> parallel, you are still tying up a finite amount of resources.
>
> So you have to go back and ask the initial question... why?
> Can you change your key?
> What is the problem you're trying to solve?
>
> The point is that just because you can do it, doesn't make it a good idea.
>
> Think of it this way... the operation was a success but the patient died.
>
>
> On May 1, 2013, at 12:12 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > I do not want to be rude or anything... But how often we need to have
> this discussion?
> >
> > When you salt your rowkeys with say 10 salt values then for each read
> you need to fork of 10 read requests, and each of them touches only 1/10th
> of the tables (which nicely with HBase's prefix scans).
> >
> > Obviously, if you only need point gets you wouldn't salting, that would
> be stupid. If you mostly do range scans, than salting is quite nice.
> >
> > Saying that salting is bad, because it does not work for point gets is
> like saying that bulldozers are bad, because you cannot use on them race
> tracks. :)
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> > From: Michael Segel <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Sent: Tuesday, April 30, 2013 10:06 AM
> > Subject: Re: Read access pattern
> >
> >
> > Sure.
> >
> > By definition, the salt number is a random seed that is not associated
> with the underlying record.
> > A simple example is a round robin counter (mod the counter by 10
> yielding [0..9] )
> >
> > So you get a record, prepend your salt and you write it out to HBase.
> The salt will push the data out to a different region.
> >
> > But what happens when you want to read the data?
> >
> > So on a full table scan... no biggie, its the same.
> >
> > But suppose I want to do a partial table scan. Now I have to do multiple
> partial scans because I dont know the salt.
> > Or if I want to do a simple get() I now have to do N number of get()s
> where N is the number of salt values allowed. In my example that's 10.
> >
> > And that's the problem.
> >
> > You are better off doing a hash of the record, use the first couple of
> bytes off the hash and then writing the record out.
> > You want the record, take the key, hash it, using the same process and
> you have 1 get().
> >
> > You're still screwed up on doing a range scan, but you can't have
> everything.
> >
> > THIS IS WHY I AND MANY CARDIOLOGISTS SAY NO TO SALT. The only difference
> is that they are talking about excess sodium chloride in your diet. I'm
> talking about using a salt aka 'random seed'.
> >
> > Does that make sense?
> >
> >
> > On Apr 30, 2013, at 11:17 AM, Shahab Yunus <[EMAIL PROTECTED]>
> wrote:
> >
> >> Well those are *some* words :) Anyway, can you explain a bit in detail
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB