|
|
-
partitioning and map/reduce &hbase hashcodes
Hiller, Dean 2010-12-19, 18:32
We happen to be looking at gigaspaces and hbase/hadoop. I read this in the gigaspaces documentation...
Target partition space ID = hashcode % (# of partitions)
Is it me or isn't that bad unless you write a special String hashcode that not only hashcodes it but makes sure the Strings hashcode stays near alphabetical hashcode such that com.google.maps, and com.google.code stay relatively local.
I mean, if I have int's for account numbers where if account numbers are close together, then they are more related, that formula would split my account numbers across the cluster, correct? The above formula would make account 3 ,4,5,6 far from each other rather than on the same node.
How does hbase work here with keys and such? I assume it is much like bigtable in that com.google.maps is stored near com.google.code since it is an ordered map, but how is that implemented(hashcode rewritten or just using string somehow?)
Thanks,
Dean
This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.
-
RE: partitioning and map/reduce &hbase hashcodes
Jonathan Gray 2010-12-19, 19:33
HBase doesn't hashcode anything. It does strict lexicographical ordering of the row keys themselves. So yes, keys with similar prefixes may be in the same partition / next to each other.
Rather than using a hashcode modulo some number, we use the META table to determine which partition (region) your key is in and also which node (regionserver) is hosting it right now. Each of our shards is a range of rows: [start,stop) rather than a true hash table.
> -----Original Message----- > From: Hiller, Dean (Contractor) [mailto:[EMAIL PROTECTED]] > Sent: Sunday, December 19, 2010 10:33 AM > To: [EMAIL PROTECTED] > Subject: partitioning and map/reduce &hbase hashcodes > > We happen to be looking at gigaspaces and hbase/hadoop. I read this in the > gigaspaces documentation... > > > > Target partition space ID = hashcode % (# of partitions) > > > > Is it me or isn't that bad unless you write a special String hashcode that not > only hashcodes it but makes sure the Strings hashcode stays near > alphabetical hashcode such that com.google.maps, and com.google.code > stay relatively local. > > > > I mean, if I have int's for account numbers where if account numbers are > close together, then they are more related, that formula would split my > account numbers across the cluster, correct? The above formula would > make account 3 ,4,5,6 far from each other rather than on the same node. > > > > How does hbase work here with keys and such? I assume it is much like > bigtable in that com.google.maps is stored near com.google.code since it is > an ordered map, but how is that implemented(hashcode rewritten or just > using string somehow?) > > > > Thanks, > > Dean > > > > > > > This message and any attachments are intended only for the use of the > addressee and may contain information that is privileged and confidential. If > the reader of the message is not the intended recipient or an authorized > representative of the intended recipient, you are hereby notified that any > dissemination of this communication is strictly prohibited. If you have > received this communication in error, please notify us immediately by e-mail > and delete the message and any attachments from your system.
-
Re: partitioning and map/reduce &hbase hashcodes
Ted Dunning 2010-12-19, 19:53
One of the key motivators for this strategy is to allow range queries to be fast.
On Sun, Dec 19, 2010 at 11:33 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote:
> HBase doesn't hashcode anything. It does strict lexicographical ordering > of the row keys themselves. So yes, keys with similar prefixes may be in > the same partition / next to each other. > > Rather than using a hashcode modulo some number, we use the META table to > determine which partition (region) your key is in and also which node > (regionserver) is hosting it right now. Each of our shards is a range of > rows: [start,stop) rather than a true hash table. > > > -----Original Message----- > > From: Hiller, Dean (Contractor) [mailto:[EMAIL PROTECTED]] > > Sent: Sunday, December 19, 2010 10:33 AM > > To: [EMAIL PROTECTED] > > Subject: partitioning and map/reduce &hbase hashcodes > > > > We happen to be looking at gigaspaces and hbase/hadoop. I read this in > the > > gigaspaces documentation... > > > > > > > > Target partition space ID = hashcode % (# of partitions) > > > > > > > > Is it me or isn't that bad unless you write a special String hashcode > that not > > only hashcodes it but makes sure the Strings hashcode stays near > > alphabetical hashcode such that com.google.maps, and com.google.code > > stay relatively local. > > > > > > > > I mean, if I have int's for account numbers where if account numbers are > > close together, then they are more related, that formula would split my > > account numbers across the cluster, correct? The above formula would > > make account 3 ,4,5,6 far from each other rather than on the same node. > > > > > > > > How does hbase work here with keys and such? I assume it is much like > > bigtable in that com.google.maps is stored near com.google.code since it > is > > an ordered map, but how is that implemented(hashcode rewritten or just > > using string somehow?) > > > > > > > > Thanks, > > > > Dean > > > > > > > > > > > > > > This message and any attachments are intended only for the use of the > > addressee and may contain information that is privileged and > confidential. If > > the reader of the message is not the intended recipient or an authorized > > representative of the intended recipient, you are hereby notified that > any > > dissemination of this communication is strictly prohibited. If you have > > received this communication in error, please notify us immediately by > e-mail > > and delete the message and any attachments from your system. > >
-
RE: partitioning and map/reduce &hbase hashcodes
Hiller, Dean 2010-12-19, 21:06
Thanks for the info there!!!! I thought it was something like that...sweet. Dean
-----Original Message----- From: Ted Dunning [mailto:[EMAIL PROTECTED]] Sent: Sunday, December 19, 2010 12:54 PM To: [EMAIL PROTECTED] Subject: Re: partitioning and map/reduce &hbase hashcodes
One of the key motivators for this strategy is to allow range queries to be fast.
On Sun, Dec 19, 2010 at 11:33 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote:
> HBase doesn't hashcode anything. It does strict lexicographical ordering > of the row keys themselves. So yes, keys with similar prefixes may be in > the same partition / next to each other. > > Rather than using a hashcode modulo some number, we use the META table to > determine which partition (region) your key is in and also which node > (regionserver) is hosting it right now. Each of our shards is a range of > rows: [start,stop) rather than a true hash table. > > > -----Original Message----- > > From: Hiller, Dean (Contractor) [mailto:[EMAIL PROTECTED]] > > Sent: Sunday, December 19, 2010 10:33 AM > > To: [EMAIL PROTECTED] > > Subject: partitioning and map/reduce &hbase hashcodes > > > > We happen to be looking at gigaspaces and hbase/hadoop. I read this in > the > > gigaspaces documentation... > > > > > > > > Target partition space ID = hashcode % (# of partitions) > > > > > > > > Is it me or isn't that bad unless you write a special String hashcode > that not > > only hashcodes it but makes sure the Strings hashcode stays near > > alphabetical hashcode such that com.google.maps, and com.google.code > > stay relatively local. > > > > > > > > I mean, if I have int's for account numbers where if account numbers are > > close together, then they are more related, that formula would split my > > account numbers across the cluster, correct? The above formula would > > make account 3 ,4,5,6 far from each other rather than on the same node. > > > > > > > > How does hbase work here with keys and such? I assume it is much like > > bigtable in that com.google.maps is stored near com.google.code since it > is > > an ordered map, but how is that implemented(hashcode rewritten or just > > using string somehow?) > > > > > > > > Thanks, > > > > Dean > > > > > > > > > > > > > > This message and any attachments are intended only for the use of the > > addressee and may contain information that is privileged and > confidential. If > > the reader of the message is not the intended recipient or an authorized > > representative of the intended recipient, you are hereby notified that > any > > dissemination of this communication is strictly prohibited. If you have > > received this communication in error, please notify us immediately by > e-mail > > and delete the message and any attachments from your system. > > This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.
|
|