Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # general >> Anyway to load certain Key/Value pair fast?


Copy link to this message
-
Anyway to load certain Key/Value pair fast?
Hi All,
I am trying to figure out a good solution for such a scenario as following.

1. I have a 2T file (let's call it A), filled by key/value pairs,
which is stored in the HDFS with the default 64M block size. In A,
each key is less than 1K and each value is about 20M.

2. Occasionally, I will run analysis by using a different type of data
(usually less than 10G, and let's call it B) and do look-up table
alike operations by using the values in A. B resides in HDFS as well.

3. This analysis would require loading only a small number of values
from A (usually less than 1000 of them) into the memory for fast
look-up against the data in B. The way B finds the few values in A is
by looking up for the key in A.

Is there an efficient way to do this?

I was thinking if I could identify the locality of the block that
contains the few values, I might be able to push the B into the few
nodes that contains the few values in A?  Since I only need to do this
occasionally, maintaining a distributed database such as HBase cant be
justified.

Many thanks.
Cao
+
Harsh J 2013-02-13, 05:29
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB