Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Primary Key Design


Copy link to this message
-
Primary Key Design
Hi,
I am starting to use following scheme for primary keys:
SHA256(URL) + "-RAW" Primary Key Schema
<https://outsideiq.jira.com/browse/CA-107>

RATIONALE:
* PKs  in Lily (user-defined) will be prepended "USER." and I can't use URI
for instance (it contains dots which is special character in current
version)
* Additionally to SHA-256-generated PK, Lily will still use UUID (which is
really unique) for versioningŠ
* IMPORTANT: we need randomize Pks; it is best practice with Hbase (data
will be randomly distributed in a cluster)

and I suggest to use similar SHA256(JSON-Object-in-UTF8) + "-OIQ" (it is
postfix so that we will have good "randomization"; in Hbase, all data is
physically sorted by PK)
- since all OIQ objects will be stored denormalized as JSON (string type
Lily) (note, it will be UTF-8 encoded, I believe it is also part of
ECMA-specs)
/**

 * {@link
http://stackoverflow.com/questions/221165/pros-and-cons-of-using-md5-hash-of
-uri-as-the-primary-key-in-a-database}

 *

 * @author Fuad

 *

 */

public class SHA256 {

public static final String SHA256(byte[] bytes) throws
NoSuchAlgorithmException {

MessageDigest md = MessageDigest.getInstance("SHA-256");

md.update(bytes);

byte[] mdbytes = md.digest();

// convert the byte to hex format

StringBuffer hexString = new StringBuffer();

for (int i = 0; i < mdbytes.length; i++) {

String hex = Integer.toHexString(0xff & mdbytes[i]);

if (hex.length() == 1)

hexString.append('0');

hexString.append(hex);

}

return hexString.toString();

}

public static final String SHA256(String text) throws
NoSuchAlgorithmException, UnsupportedEncodingException  {

return SHA256(text.getBytes("UTF-8"));

}

}

--
Fuad Efendi