Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> OT - Hash Code Creation


Copy link to this message
-
Re: OT - Hash Code Creation
Try hash table with double hashing.
Something like this
http://www.java2s.com/Code/Java/Collections-Data-Structure/Hashtablewithdoublehashing.htm

2011/3/17 Peter Haidinyak <[EMAIL PROTECTED]>

> Hi,
>        This is a little off topic but this group seems pretty swift so I
> thought I would ask. I am aggregating a day's worth of log data which means
> I have a Map of over 24 million elements. What would be a good algorithm to
> use for generating Hash Codes for these elements that cut down on
> collisions? I application starts out reading in a log (144 logs in all) in
> about 20 seconds and by the time I reach the last log it is taking around
> 120 seconds. The extra 100 seconds have to do with Hash Table Collisions.
> I've played around with different Hashing algorithms and cut the original
> time from over 300 seconds to 120 but I know I can do better.
> The key I am using for the Map is an alpha-numeric string that is
> approximately 16 character long with the last 4 or 5 character being the
> most unique.
>
> Any ideas?
>
> Thanks
>
> -Pete
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB