Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Hadoop/Lucene + Solr architecture suggestions?


Copy link to this message
-
Re: Hadoop/Lucene + Solr architecture suggestions?
On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog <[EMAIL PROTECTED]> wrote:
> In the LucidWorks Big Data product, we handle this with a reducer that sends documents to a SolrCloud cluster. This way the index files are not managed by Hadoop.

Hi Lance,
I'm curious if you've gotten that to work with a decent-sized (e.g. >
250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
from a few months ago at least...

Thanks,
--tim

> ----- Original Message -----
> | From: "Ted Dunning" <[EMAIL PROTECTED]>
> | To: [EMAIL PROTECTED]
> | Cc: "Hadoop User" <[EMAIL PROTECTED]>
> | Sent: Wednesday, October 10, 2012 7:58:57 AM
> | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> |
> | I prefer to create indexes in the reducer personally.
> |
> | Also you can avoid the copies if you use an advanced hadoop-derived
> | distro. Email me off list for details.
> |
> | Sent from my iPhone
> |
> | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <[EMAIL PROTECTED]>
> | wrote:
> |
> | > Hi,
> | >
> | > if I create a Lucene index in each mapper, locally, then copy them
> | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> | > copy them to some Solr machine (perhaps even merging), does such
> | > architecture makes sense, to create a searchable index with
> | > Hadoop?
> | >
> | > Are there links for similar architectures and questions?
> | >
> | > Thank you. Sincerely,
> | > Mark
> |
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB