Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Concatenate adjacent lines with hadoop


Copy link to this message
-
Re: Concatenate adjacent lines with hadoop
That's easy, in your example,

Map output key: FIELD-N ; Map output value: just original value.
In the reduece: if there is  LOGTAG<TAB> in the value, then this is the
first log entry. if not, this is a splitted log entry. just get a sub
string and concat with the first log entry.

Am I explain clearly?

On Wed, Feb 27, 2013 at 9:36 AM, Matthieu Labour <[EMAIL PROTECTED]>wrote:

> Hi
>
> Please find below the issue I need to solve. Thank you in advance for your
> help/ tips.
>
> I have log files where sometimes log lines are splited (this happens when
> the log line exceeds a specific length)
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-MAX
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is being
> splitted
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>
> Can I "reconcile"/ "concatenate" splited log lines with a hadoop map
> reduce job?
>
> On other words, using a map reduce job, can I concatenate the 2 following
> adjacent lines (provided that I 'detect' them)
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is being
> splitted
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>
> into
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>
> Thank you!
>
+
Matthieu Labour 2013-02-27, 05:01
+
Azuryy Yu 2013-02-27, 05:16