Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Primary Key Design

Copy link to this message
Primary Key Design
Fuad Efendi 2011-08-03, 14:25
I am starting to use following scheme for primary keys:
SHA256(URL) + "-RAW" Primary Key Schema

* PKs  in Lily (user-defined) will be prepended "USER." and I can't use URI
for instance (it contains dots which is special character in current
* Additionally to SHA-256-generated PK, Lily will still use UUID (which is
really unique) for versioningŠ
* IMPORTANT: we need randomize Pks; it is best practice with Hbase (data
will be randomly distributed in a cluster)

and I suggest to use similar SHA256(JSON-Object-in-UTF8) + "-OIQ" (it is
postfix so that we will have good "randomization"; in Hbase, all data is
physically sorted by PK)
- since all OIQ objects will be stored denormalized as JSON (string type
Lily) (note, it will be UTF-8 encoded, I believe it is also part of

 * {@link


 * @author Fuad



public class SHA256 {

public static final String SHA256(byte[] bytes) throws
NoSuchAlgorithmException {

MessageDigest md = MessageDigest.getInstance("SHA-256");


byte[] mdbytes = md.digest();

// convert the byte to hex format

StringBuffer hexString = new StringBuffer();

for (int i = 0; i < mdbytes.length; i++) {

String hex = Integer.toHexString(0xff & mdbytes[i]);

if (hex.length() == 1)




return hexString.toString();


public static final String SHA256(String text) throws
NoSuchAlgorithmException, UnsupportedEncodingException  {

return SHA256(text.getBytes("UTF-8"));



Fuad Efendi