Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Query regarding readMultiLine in Morphlines config

Copy link to this message
Query regarding readMultiLine in Morphlines config
I have a log file with multiple records. (1 line= 1 record).

I want to send N lines (say 20) at a time to morphlines, and then send it to Solr as a single Solr document.

(This is an experiment to see if the performance is better than the regular way, of using readLine and parsing each log line as a solarDocument).

The number of documents is going to be in billions.
I had a look at the readMultiLine documentation present here: http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/readMultiLine
I would like to know how to effectively use readMultiLine(if it is possible), to tell readMultiLine to pick up 20 lines/records in one go, and create 20 fields with the text of each line. (use a counter within the regex, or something similar).
Kindly let me know if you have worked on something similar, or redirect me to some informative pages for similar problem statement.


Sanjay Ramanathan