Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Anyway to load certain Key/Value pair fast?

Copy link to this message
Anyway to load certain Key/Value pair fast?
Hi All,
I am trying to figure out a good solution for such a scenario as following.

1. I have a 2T file (let's call it A), filled by key/value pairs,
which is stored in the HDFS with the default 64M block size. In A,
each key is less than 1K and each value is about 20M.

2. Occasionally, I will run analysis by using a different type of data
(usually less than 10G, and let's call it B) and do look-up table
alike operations by using the values in A. B resides in HDFS as well.

3. This analysis would require loading only a small number of values
from A (usually less than 1000 of them) into the memory for fast
look-up against the data in B. The way B finds the few values in A is
by looking up for the key in A.

Is there an efficient way to do this?

I was thinking if I could identify the locality of the block that
contains the few values, I might be able to push the B into the few
nodes that contains the few values in A?  Since I only need to do this
occasionally, maintaining a distributed database such as HBase cant be

Many thanks.