Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Using external indexes in an HBase Map/Reduce job...


Copy link to this message
-
Re: Using external indexes in an HBase Map/Reduce job...
Hi Michael Segel.

If I understand your question correctrly, you looking for optimal way
for scanning
index search results? If not, my answer below is not relevant :).

1. For mr joins or large index results scan bloom filters can be used
like described here
http://blog.rapleaf.com/dev/2009/09/25/batch-querying-with-cascading/

2. Another option: denormalize data in same or separate table.
(depends on nature of object relations).

3. Random gets. For each row from solr issue random get. (for really
small result sets or paging).

4. Put compacted data (latest data, small subset of data etc) into solr index.
2010/10/12 Michael Segel <[EMAIL PROTECTED]>:
>
> Hi,
>
> Now I realize that most everyone is sitting in NY, while some of us can't leave our respective cities....
>
> Came across this problem and I was wondering how others solved it.
>
> Suppose you have a really large table with 1 billion rows of data.
> Since HBase really doesn't have any indexes built in (Don't get me started about the contrib/transactional stuff...), you're forced to use some sort of external index, or roll your own index table.
>
> The net result is that you end up with a list object that contains your result set.
>
> So the question is... what's the best way to feed the list object in?
>
> One option I thought about is writing the object to a file and then using it as the file in and then control the splitters. Not the most efficient but it would work.
>
> Was trying to find a more 'elegant' solution and I'm sure that anyone using SOLR or LUCENE or whatever... had come across this problem too.
>
> Any suggestions?
>
> Thx
>
>