Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Concatenate adjacent lines with hadoop


Copy link to this message
-
Concatenate adjacent lines with hadoop
Hi

Please find below the issue I need to solve. Thank you in advance for your
help/ tips.

I have log files where sometimes log lines are splited (this happens when
the log line exceeds a specific length)

Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-MAX
Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is being
splitted
Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX

Can I "reconcile"/ "concatenate" splited log lines with a hadoop map reduce
job?

On other words, using a map reduce job, can I concatenate the 2 following
adjacent lines (provided that I 'detect' them)

Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is being
splitted
Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX

into

Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX

Thank you!