Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> File hash key case observation


Copy link to this message
-
Re: File hash key case observation
Are you working to ingest a large number of files into Accumulo?
On Thu, Dec 5, 2013 at 11:30 PM, David Medinets <[EMAIL PROTECTED]>wrote:

> After ingesting a few million files using the method in the Accumulo File
> System Archive (http://accumulo.apache.org/1.4/examples/dirlist.html) we
> ran into a problem reading the information back out of Accumulo. I forget
> the error but I resolved it by using DigestUtils.md5hex instead of
> Digestutils.md5 which stored the md5 as hex string instead of a binary
> value. We did not dig into what caused the error we just side-stepped it.
>
>
> On Wed, Dec 4, 2013 at 11:37 PM, Chris Carrino <[EMAIL PROTECTED]>wrote:
>
>> The org.apache.accumulo.examples.simple.filedata.FileDataIngest class
>> generates LOWERCASE hash keys via the hexString() method, and uses them as
>> row ID's for storing file chunks in Accumulo.  Note that NIST uses
>> UPPERCASE hash keys in the Reference Data Set (RDS).  See
>> http://www.nsrl.nist.gov/ for the RDS.  Both approaches are valid since
>> the hexadecimal representation of the key is not case sensitive - but make
>> sure you normalize to one case if you are comparing the keys generated in
>> the FileDataIngest class to the RDS keys.
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB