Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Concatenate adjacent lines with hadoop


Copy link to this message
-
Re: Concatenate adjacent lines with hadoop
That's easy, in your example,

Map output key: FIELD-N ; Map output value: just original value.
In the reduece: if there is  LOGTAG<TAB> in the value, then this is the
first log entry. if not, this is a splitted log entry. just get a sub
string and concat with the first log entry.

Am I explain clearly?

On Wed, Feb 27, 2013 at 9:36 AM, Matthieu Labour <[EMAIL PROTECTED]>wrote:

> Hi
>
> Please find below the issue I need to solve. Thank you in advance for your
> help/ tips.
>
> I have log files where sometimes log lines are splited (this happens when
> the log line exceeds a specific length)
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-MAX
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is being
> splitted
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>
> Can I "reconcile"/ "concatenate" splited log lines with a hadoop map
> reduce job?
>
> On other words, using a map reduce job, can I concatenate the 2 following
> adjacent lines (provided that I 'detect' them)
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is being
> splitted
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>
> into
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>
> Thank you!
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB