Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - SolrCell help!


Copy link to this message
-
SolrCell help!
Flavio Pompermaier 2013-07-22, 16:18
Hi to all,
I'm trying to understand how to "master" Morphline configuration files in
order to put some data into Solr but I'm facing some problem with
TestMorphlineSolrSink. This is what I done:

1) Since I want to index the title of the testXML.xml (i.e. "Tika test
document") so I commented out all the parsers
except org.apache.tika.parser.xml.DcXMLParser (which parse Doublin Core
metadata)
2) In schema.xml I added the following field:
    <field name="title" type="text_en" indexed="true" stored="true"
multiValued="false" />

But:
 - If I don't add anything to fmap or capture everything works fine but I
don't understand why (who fills that field?). If instead I add to capture
title or/and to famp title: title (or dc_title:title) Solr complains that 2
values are retrieved for 'title' (debugging the values I see the title and
one empty value in the 'title\ metadata array...).
Thus, the problem is that everything works magically if the field is named
title, but if I change its name to something like doc_title there's no way
to make it non-multivalued.  Am I right? How can I fix this problem?
- I'd like to manage JSON files..How can I map JSON fields to Solr fields?
Could someone give a simple example?

Best,
Flavio