Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Concatenate adjacent lines with hadoop


Copy link to this message
-
Re: Concatenate adjacent lines with hadoop
I just noticed your two lines are all started with: Dec 16 21:47:20
d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app

does that different for other lines? if your answer is yes, then just using
this prefix as map output key.
On Wed, Feb 27, 2013 at 1:01 PM, Matthieu Labour <[EMAIL PROTECTED]>wrote:

> Thank you for your answer. I am not sure i understand fully. My email was
> most likely not very clear. Here is an example of log line. Please note the
> beginning of the log line YSLOGROW. Please note that the second line should
> be concatenated with the first line.
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] YSLOGROW
> 20121216T214720.345Z
> remote-addr=166.137.156.155&user-agent=Mozilla%2F5.0+%28Linux%3B+U%3B+Android+4.0.4%3B+en-us%3B+SAMSUNG-SGH-I717+Build%2FIMM76D%29+AppleWebKit%2F534.30+%28KHTML%2C+like+Gecko%29+Version%2F4.0+Mobile+Safari%2F534.30&referrer=http%3A%2F%
> 2Flp.mydas.mobi
> %2F%2Frich%2Ffoundation%2FdynamicInterstitial%2Fint_launch.php%3Fmm_urid%3DWBNMMG9h4XmbJBUHbDrNWWWm%26mm_ipaddress%3D166.137.156.155%26mm_handset%3D8440%26mm_carrier%3D2%26mm_apid%3D78683%26mm_acid%3D1050500%26mm_osid%3D14%26mm_uip%3D166.137.156.155%26mm_ua%3DMozilla%252F5.0%2B%2528Linux%253B%2BU%253B%2BAndroid%2B4.0.4%253B%2Ben-us%253B%2BSAMSUNG-SGH-I717%2BBuild%252FIMM76D%2529%2BAppleWebKit%252F534.30%2B%2528KHTML%252C%2Blike%2BGecko%2529%2BVersion%252F4.0%2BMobile%2BSafari%252F534.30SAMSUNG-SGH-I717%26mtpid%3DUNKNOWN%26mm_msuid%3DUNKNOWN%26mm_mmisdk%3D4.6.0-12.07.16.a%26mm_mxsdk%3DUNKNOWN%26mm_dv%3DAndroid4.0.4%26mm_adtype%3DMMFullScreenAdTransition%26mm_hswd%3DUNKNOWN%26mm_dm%3DSAMSUNG-SGH-I717%26mm_hsht%3DUNKNOWN%26mm_auid%3Dmmi
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> d_bd6b33dc569994102eaa60a060987d99e9_013b35a758bd%26mm_accelerometer%3Dtrue%26mm_lat%3DUNKNOWN%26mm_long%3DUNKNOWN%26mm_hpx%3D1280%26mm_wpx%3D800%26mm_density%3D2.0%26mm_dpi%3DUNKNOWN%26mm_campaignid%3D45695%26autoExpand%3Dtrue&query-string=ncid%3DWBNMMG9h4XmbJBUHbDrNWWWm
> tr7y MLNL 1009 10034 3401 t4fx 10034 click
>
>
> On Tue, Feb 26, 2013 at 9:39 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
>
>> That's easy, in your example,
>>
>> Map output key: FIELD-N ; Map output value: just original value.
>> In the reduece: if there is  LOGTAG<TAB> in the value, then this is the
>> first log entry. if not, this is a splitted log entry. just get a sub
>> string and concat with the first log entry.
>>
>> Am I explain clearly?
>>
>>
>>
>> On Wed, Feb 27, 2013 at 9:36 AM, Matthieu Labour <[EMAIL PROTECTED]>wrote:
>>
>>> Hi
>>>
>>> Please find below the issue I need to solve. Thank you in advance for
>>> your help/ tips.
>>>
>>> I have log files where sometimes log lines are splited (this happens
>>> when the log line exceeds a specific length)
>>>
>>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-MAX
>>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is being
>>> splitted
>>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>>> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>>>
>>> Can I "reconcile"/ "concatenate" splited log lines with a hadoop map
>>> reduce job?
>>>
>>> On other words, using a map reduce job, can I concatenate the 2
>>> following adjacent lines (provided that I 'detect' them)
>>>
>>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is being
>>> splitted
>>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>>> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>>>
>>> into
>>>
>>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>>>
>>> Thank you!
>>>
>>
>>
>
>
> --
> Matthieu Labour, Engineering | *Action**X* |
> 584 Broadway, Suite 1002 – NY, NY 10012
> 415-994-3480 (m)
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB