Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - partitioning and map/reduce &hbase hashcodes


Copy link to this message
-
RE: partitioning and map/reduce &hbase hashcodes
Jonathan Gray 2010-12-19, 19:33
HBase doesn't hashcode anything.  It does strict lexicographical ordering of the row keys themselves.  So yes, keys with similar prefixes may be in the same partition / next to each other.

Rather than using a hashcode modulo some number, we use the META table to determine which partition (region) your key is in and also which node (regionserver) is hosting it right now.  Each of our shards is a range of rows: [start,stop) rather than a true hash table.

> -----Original Message-----
> From: Hiller, Dean (Contractor) [mailto:[EMAIL PROTECTED]]
> Sent: Sunday, December 19, 2010 10:33 AM
> To: [EMAIL PROTECTED]
> Subject: partitioning and map/reduce &hbase hashcodes
>
> We happen to be looking at gigaspaces and hbase/hadoop.  I read this in the
> gigaspaces documentation...
>
>
>
> Target partition space ID = hashcode % (# of partitions)
>
>
>
> Is it me or isn't that bad unless you write a special String hashcode that not
> only hashcodes it but makes sure the Strings hashcode stays near
> alphabetical hashcode such that com.google.maps, and com.google.code
> stay relatively local.
>
>
>
> I mean, if I have int's for account numbers where if account numbers are
> close together, then they are more related, that formula would split my
> account numbers across the cluster, correct?  The above formula would
> make account 3 ,4,5,6 far from each other rather than on the same node.
>
>
>
> How does hbase work here with keys and such?  I assume it is much like
> bigtable in that com.google.maps is stored near com.google.code since it is
> an ordered map, but how is that implemented(hashcode rewritten or just
> using string somehow?)
>
>
>
> Thanks,
>
> Dean
>
>
>
>
>
>
> This message and any attachments are intended only for the use of the
> addressee and may contain information that is privileged and confidential. If
> the reader of the message is not the intended recipient or an authorized
> representative of the intended recipient, you are hereby notified that any
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, please notify us immediately by e-mail
> and delete the message and any attachments from your system.