|
|
-
Re: Concatenate adjacent lines with hadoop
Azuryy Yu 2013-02-27, 02:39
That's easy, in your example,
Map output key: FIELD-N ; Map output value: just original value. In the reduece: if there is LOGTAG<TAB> in the value, then this is the first log entry. if not, this is a splitted log entry. just get a sub string and concat with the first log entry.
Am I explain clearly?
On Wed, Feb 27, 2013 at 9:36 AM, Matthieu Labour <[EMAIL PROTECTED]>wrote:
> Hi > > Please find below the issue I need to solve. Thank you in advance for your > help/ tips. > > I have log files where sometimes log lines are splited (this happens when > the log line exceeds a specific length) > > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-MAX > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N <======= log line is being > splitted > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX > > Can I "reconcile"/ "concatenate" splited log lines with a hadoop map > reduce job? > > On other words, using a map reduce job, can I concatenate the 2 following > adjacent lines (provided that I 'detect' them) > > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N <======= log line is being > splitted > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX > > into > > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX > > Thank you! >
+
Azuryy Yu 2013-02-27, 02:39
-
Re: Concatenate adjacent lines with hadoop
Matthieu Labour 2013-02-27, 05:01
Thank you for your answer. I am not sure i understand fully. My email was most likely not very clear. Here is an example of log line. Please note the beginning of the log line YSLOGROW. Please note that the second line should be concatenated with the first line.
Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] YSLOGROW 20121216T214720.345Z remote-addr=166.137.156.155&user-agent=Mozilla%2F5.0+%28Linux%3B+U%3B+Android+4.0.4%3B+en-us%3B+SAMSUNG-SGH-I717+Build%2FIMM76D%29+AppleWebKit%2F534.30+%28KHTML%2C+like+Gecko%29+Version%2F4.0+Mobile+Safari%2F534.30&referrer=http%3A%2F% 2Flp.mydas.mobi %2F%2Frich%2Ffoundation%2FdynamicInterstitial%2Fint_launch.php%3Fmm_urid%3DWBNMMG9h4XmbJBUHbDrNWWWm%26mm_ipaddress%3D166.137.156.155%26mm_handset%3D8440%26mm_carrier%3D2%26mm_apid%3D78683%26mm_acid%3D1050500%26mm_osid%3D14%26mm_uip%3D166.137.156.155%26mm_ua%3DMozilla%252F5.0%2B%2528Linux%253B%2BU%253B%2BAndroid%2B4.0.4%253B%2Ben-us%253B%2BSAMSUNG-SGH-I717%2BBuild%252FIMM76D%2529%2BAppleWebKit%252F534.30%2B%2528KHTML%252C%2Blike%2BGecko%2529%2BVersion%252F4.0%2BMobile%2BSafari%252F534.30SAMSUNG-SGH-I717%26mtpid%3DUNKNOWN%26mm_msuid%3DUNKNOWN%26mm_mmisdk%3D4.6.0-12.07.16.a%26mm_mxsdk%3DUNKNOWN%26mm_dv%3DAndroid4.0.4%26mm_adtype%3DMMFullScreenAdTransition%26mm_hswd%3DUNKNOWN%26mm_dm%3DSAMSUNG-SGH-I717%26mm_hsht%3DUNKNOWN%26mm_auid%3Dmmi
Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] d_bd6b33dc569994102eaa60a060987d99e9_013b35a758bd%26mm_accelerometer%3Dtrue%26mm_lat%3DUNKNOWN%26mm_long%3DUNKNOWN%26mm_hpx%3D1280%26mm_wpx%3D800%26mm_density%3D2.0%26mm_dpi%3DUNKNOWN%26mm_campaignid%3D45695%26autoExpand%3Dtrue&query-string=ncid%3DWBNMMG9h4XmbJBUHbDrNWWWm tr7y MLNL 1009 10034 3401 t4fx 10034 click On Tue, Feb 26, 2013 at 9:39 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
> That's easy, in your example, > > Map output key: FIELD-N ; Map output value: just original value. > In the reduece: if there is LOGTAG<TAB> in the value, then this is the > first log entry. if not, this is a splitted log entry. just get a sub > string and concat with the first log entry. > > Am I explain clearly? > > > > On Wed, Feb 27, 2013 at 9:36 AM, Matthieu Labour <[EMAIL PROTECTED]>wrote: > >> Hi >> >> Please find below the issue I need to solve. Thank you in advance for >> your help/ tips. >> >> I have log files where sometimes log lines are splited (this happens when >> the log line exceeds a specific length) >> >> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] >> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-MAX >> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] >> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N <======= log line is being >> splitted >> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] >> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX >> >> Can I "reconcile"/ "concatenate" splited log lines with a hadoop map >> reduce job? >> >> On other words, using a map reduce job, can I concatenate the 2 following >> adjacent lines (provided that I 'detect' them) >> >> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] >> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N <======= log line is being >> splitted >> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] >> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX >> >> into >> >> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] >> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX >> >> Thank you! >> > > -- Matthieu Labour, Engineering | *Action**X* | 584 Broadway, Suite 1002 – NY, NY 10012 415-994-3480 (m)
+
Matthieu Labour 2013-02-27, 05:01
-
Re: Concatenate adjacent lines with hadoop
Azuryy Yu 2013-02-27, 05:16
I just noticed your two lines are all started with: Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app
does that different for other lines? if your answer is yes, then just using this prefix as map output key. On Wed, Feb 27, 2013 at 1:01 PM, Matthieu Labour <[EMAIL PROTECTED]>wrote:
> Thank you for your answer. I am not sure i understand fully. My email was > most likely not very clear. Here is an example of log line. Please note the > beginning of the log line YSLOGROW. Please note that the second line should > be concatenated with the first line. > > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] YSLOGROW > 20121216T214720.345Z > remote-addr=166.137.156.155&user-agent=Mozilla%2F5.0+%28Linux%3B+U%3B+Android+4.0.4%3B+en-us%3B+SAMSUNG-SGH-I717+Build%2FIMM76D%29+AppleWebKit%2F534.30+%28KHTML%2C+like+Gecko%29+Version%2F4.0+Mobile+Safari%2F534.30&referrer=http%3A%2F% > 2Flp.mydas.mobi > %2F%2Frich%2Ffoundation%2FdynamicInterstitial%2Fint_launch.php%3Fmm_urid%3DWBNMMG9h4XmbJBUHbDrNWWWm%26mm_ipaddress%3D166.137.156.155%26mm_handset%3D8440%26mm_carrier%3D2%26mm_apid%3D78683%26mm_acid%3D1050500%26mm_osid%3D14%26mm_uip%3D166.137.156.155%26mm_ua%3DMozilla%252F5.0%2B%2528Linux%253B%2BU%253B%2BAndroid%2B4.0.4%253B%2Ben-us%253B%2BSAMSUNG-SGH-I717%2BBuild%252FIMM76D%2529%2BAppleWebKit%252F534.30%2B%2528KHTML%252C%2Blike%2BGecko%2529%2BVersion%252F4.0%2BMobile%2BSafari%252F534.30SAMSUNG-SGH-I717%26mtpid%3DUNKNOWN%26mm_msuid%3DUNKNOWN%26mm_mmisdk%3D4.6.0-12.07.16.a%26mm_mxsdk%3DUNKNOWN%26mm_dv%3DAndroid4.0.4%26mm_adtype%3DMMFullScreenAdTransition%26mm_hswd%3DUNKNOWN%26mm_dm%3DSAMSUNG-SGH-I717%26mm_hsht%3DUNKNOWN%26mm_auid%3Dmmi > > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > d_bd6b33dc569994102eaa60a060987d99e9_013b35a758bd%26mm_accelerometer%3Dtrue%26mm_lat%3DUNKNOWN%26mm_long%3DUNKNOWN%26mm_hpx%3D1280%26mm_wpx%3D800%26mm_density%3D2.0%26mm_dpi%3DUNKNOWN%26mm_campaignid%3D45695%26autoExpand%3Dtrue&query-string=ncid%3DWBNMMG9h4XmbJBUHbDrNWWWm > tr7y MLNL 1009 10034 3401 t4fx 10034 click > > > On Tue, Feb 26, 2013 at 9:39 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote: > >> That's easy, in your example, >> >> Map output key: FIELD-N ; Map output value: just original value. >> In the reduece: if there is LOGTAG<TAB> in the value, then this is the >> first log entry. if not, this is a splitted log entry. just get a sub >> string and concat with the first log entry. >> >> Am I explain clearly? >> >> >> >> On Wed, Feb 27, 2013 at 9:36 AM, Matthieu Labour <[EMAIL PROTECTED]>wrote: >> >>> Hi >>> >>> Please find below the issue I need to solve. Thank you in advance for >>> your help/ tips. >>> >>> I have log files where sometimes log lines are splited (this happens >>> when the log line exceeds a specific length) >>> >>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] >>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-MAX >>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] >>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N <======= log line is being >>> splitted >>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] >>> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX >>> >>> Can I "reconcile"/ "concatenate" splited log lines with a hadoop map >>> reduce job? >>> >>> On other words, using a map reduce job, can I concatenate the 2 >>> following adjacent lines (provided that I 'detect' them) >>> >>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] >>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N <======= log line is being >>> splitted >>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] >>> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX >>> >>> into >>> >>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] >>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX >>> >>> Thank you! >>> >> >> > > > -- > Matthieu Labour, Engineering | *Action**X* | > 584 Broadway, Suite 1002 – NY, NY 10012 > 415-994-3480 (m) >
+
Azuryy Yu 2013-02-27, 05:16
|
|