Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Primary Key Design


Copy link to this message
-
Re: Primary Key Design
Fuad Efendi 2011-08-03, 15:32

Such design is enforced for RAW: we need to keep history of HTMLs under
the same ID value, that's why first candidate for ID is URL, and finally
we use SHA(URL)

For OIQ, it must be carefully planned. SHA(JSON) has benefit of implicit
"equals" implementation (JSON objects are not the same if ID := SHA(JSON)
is different)

-Fuad
On 11-08-03 10:25 AM, "Fuad Efendi" <[EMAIL PROTECTED]> wrote:

>Hi,
>
>
>I am starting to use following scheme for primary keys:
>SHA256(URL) + "-RAW" Primary Key Schema
><https://outsideiq.jira.com/browse/CA-107>
>
>
>
>RATIONALE:
>* PKs  in Lily (user-defined) will be prepended "USER." and I can't use
>URI
>for instance (it contains dots which is special character in current
>version)
>* Additionally to SHA-256-generated PK, Lily will still use UUID (which is
>really unique) for versioning?
>* IMPORTANT: we need randomize Pks; it is best practice with Hbase (data
>will be randomly distributed in a cluster)
>
>and I suggest to use similar SHA256(JSON-Object-in-UTF8) + "-OIQ" (it is
>postfix so that we will have good "randomization"; in Hbase, all data is
>physically sorted by PK)
>- since all OIQ objects will be stored denormalized as JSON (string type
>Lily) (note, it will be UTF-8 encoded, I believe it is also part of
>ECMA-specs)
>
>
>
>
>/**
>
> * {@link
>http://stackoverflow.com/questions/221165/pros-and-cons-of-using-md5-hash-
>of
>-uri-as-the-primary-key-in-a-database}
>
> *
>
> * @author Fuad
>
> *
>
> */
>
>public class SHA256 {
>
>
>
>public static final String SHA256(byte[] bytes) throws
>NoSuchAlgorithmException {
>
>MessageDigest md = MessageDigest.getInstance("SHA-256");
>
>md.update(bytes);
>
>byte[] mdbytes = md.digest();
>
>
>
>// convert the byte to hex format
>
>StringBuffer hexString = new StringBuffer();
>
>for (int i = 0; i < mdbytes.length; i++) {
>
>String hex = Integer.toHexString(0xff & mdbytes[i]);
>
>if (hex.length() == 1)
>
>hexString.append('0');
>
>hexString.append(hex);
>
>}
>
>
>
>return hexString.toString();
>
>}
>
>
>
>
>
>public static final String SHA256(String text) throws
>NoSuchAlgorithmException, UnsupportedEncodingException  {
>
>return SHA256(text.getBytes("UTF-8"));
>
>}
>
>
>
>}
>
>
>
>
>
>
>
>
>
>
>
>--
>Fuad Efendi
>
>
>
>
>