Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> File hash key case observation


Copy link to this message
-
Re: File hash key case observation
After ingesting a few million files using the method in the Accumulo File
System Archive (http://accumulo.apache.org/1.4/examples/dirlist.html) we
ran into a problem reading the information back out of Accumulo. I forget
the error but I resolved it by using DigestUtils.md5hex instead of
Digestutils.md5 which stored the md5 as hex string instead of a binary
value. We did not dig into what caused the error we just side-stepped it.
On Wed, Dec 4, 2013 at 11:37 PM, Chris Carrino <[EMAIL PROTECTED]>wrote:

> The org.apache.accumulo.examples.simple.filedata.FileDataIngest class
> generates LOWERCASE hash keys via the hexString() method, and uses them as
> row ID's for storing file chunks in Accumulo.  Note that NIST uses
> UPPERCASE hash keys in the Reference Data Set (RDS).  See
> http://www.nsrl.nist.gov/ for the RDS.  Both approaches are valid since
> the hexadecimal representation of the key is not case sensitive - but make
> sure you normalize to one case if you are comparing the keys generated in
> the FileDataIngest class to the RDS keys.
>