Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> SolrCell help!


Hi to all,
I'm trying to understand how to "master" Morphline configuration files in
order to put some data into Solr but I'm facing some problem with
TestMorphlineSolrSink. This is what I done:

1) Since I want to index the title of the testXML.xml (i.e. "Tika test
document") so I commented out all the parsers
except org.apache.tika.parser.xml.DcXMLParser (which parse Doublin Core
metadata)
2) In schema.xml I added the following field:
    <field name="title" type="text_en" indexed="true" stored="true"
multiValued="false" />

But:
 - If I don't add anything to fmap or capture everything works fine but I
don't understand why (who fills that field?). If instead I add to capture
title or/and to famp title: title (or dc_title:title) Solr complains that 2
values are retrieved for 'title' (debugging the values I see the title and
one empty value in the 'title\ metadata array...).
Thus, the problem is that everything works magically if the field is named
title, but if I change its name to something like doc_title there's no way
to make it non-multivalued.  Am I right? How can I fix this problem?
- I'd like to manage JSON files..How can I map JSON fields to Solr fields?
Could someone give a simple example?

Best,
Flavio
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB