Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Event breaking in flume


Copy link to this message
-
RE: Event breaking in flume
Chhaya Vishwakarma 2014-01-02, 05:40
Hi

Flume source by default sends one line from file as one event? What exactly the use of interceptors ? if I will use morphline interceptor will it send multiple line?
Regards,
Chhaya Vishwakarma

From: Brock Noland [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, December 31, 2013 8:25 PM
To: [EMAIL PROTECTED]
Subject: Re: Event breaking in flume

You you'd need to do Java. If you want to use Python, I would use the second solution I posted earlier.

"Another solution is:

1) replace new lines with something like __NL__ by a perl script in your exec source
2) Use morphlines to replace __NL__ with \n"
On Tue, Dec 31, 2013 at 12:49 AM, Chhaya Vishwakarma <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
How about using python ?

From: Ashish [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Tuesday, December 31, 2013 9:53 AM

To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: Event breaking in flume

Have a look at org.apache.flume.serialization.LineDeserializer in flume-ng-core module

On Tue, Dec 31, 2013 at 9:24 AM, Chhaya Vishwakarma <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi brock

Thanks. Using spooling directory with deserializer looks good however i don't have any idea of how to write custom deserializer.
Can you give me little hint how should i go about writing my own deserializer it will be a great help.
Regards,
Chhaya Vishwakarma

From: Brock Noland [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Monday, December 30, 2013 7:48 PM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: Event breaking in flume

Yes, it is possible to handle multi-line events and handling stack traces is very common place.

However, using exec source is going to be limiting. The "correct" solution is:

1) Use spooling directory source
2) Write a little deserializer to handle your format.

Another solution is:

1) replace new lines with something like __NL__ by a perl script in your exec source
2) Use morphlines to replace __NL__ with \n

A third and less desirable solution would be:

1) Use the morphlines intercepter to merge multiple events to a single event. This will not work well for a varity or reasons but the most common being that the exec source could hit it's "batch" size in the middle of of a stack trace in which case the stack trace will be in to different batches.

Brock
On Mon, Dec 30, 2013 at 5:05 AM, Joao Salcedo <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Looks that it is possible based on regular expression pattern matching

http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/readMultiLine

On Mon, Dec 30, 2013 at 9:56 PM, Chhaya Vishwakarma <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
So is it not possible to handle multiline events in flume?

From: Joao Salcedo [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Monday, December 30, 2013 4:22 PM

To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: Event breaking in flume

Maybe you can set up some morphlines and do some ETL in your event.

I hope this help you.

http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-and-integrate-etl-apps-for-apache-hadoop/

Cheers

On Mon, Dec 30, 2013 at 9:34 PM, Ashish <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
I am not aware of any options out of the box. Maybe someone else can help.
Alternate way is to write a custom source.

On Mon, Dec 30, 2013 at 3:56 PM, Chhaya Vishwakarma <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi
Exec as source and tail command
From: Ashish [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Monday, December 30, 2013 3:48 PM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: Event breaking in flume

What is the Source you are using?

On Mon, Dec 30, 2013 at 3:23 PM, Chhaya Vishwakarma <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi,

By default flume considers one line as one event, But I want to do breaking on some other criteria how it can be achieved in flume? Is it possible to do ?

10 Sep 2013 19:43:33,561 [WebContainer : 9] ERROR - An Error has occured for com.marsh.framework.core.exception.MarshException: Record has been modified since last retrieved - Resubmit transaction

10 Sep 2013 19:43:33,561 [WebContainer : 9] ERROR - handleException():com.marsh.framework.core.exception.MarshException: Record has been modified since last retrieved - Resubmit transaction
     at com.marsh.csa.serviceagreement.ServiceAgreementImpl.updateAgreement(ServiceAgreementImpl.java(Compiled Code))
     at com.marsh.csa.serviceagreementmgmt.CSAManagerImpl.updateCSA(CSAManagerImpl.java(Compiled Code))
     at com.marsh.csa.serviceagreementmgmt.ejb.EJSRemoteStatelessServiceagreementManager_3dcfd156.updateCSA(Unknown Source)
     at com.marsh.csa.serviceagreementmgmt.ejb._ServiceagreementManagerRemote_Stub.updateCSA(_ServiceagreementManagerRemote_Stub.java(Compiled Code))
     at com.marsh.csa.proxy.CSAProxy.updateCSA(CSAProxy.java(Compiled Code))
     at com.marsh.csa.serviceagreement.SaveCSAAction.performAction(SaveCSAAction.java(Compiled Code))
     at com.marsh.csa.serviceagreement.CSAAbstractStrutsAction.execute(CSAAbstractStrutsAction.java(Compiled Code))
     at org.apache.struts.action.RequestProcessor.processActionPerform(RequestProcessor.java(Inlined Compiled Code))
     at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java(Compiled Code))
Caused by: com.marsh.framework.core.exception.MarshException: Record has been modified since last retrieved - Resubmit transaction
     at com.marsh.csa.serviceagreement.ServiceAgreementDAO.updateServiceAgreement(ServiceAgreementDAO.java(Compiled Code))
     at com.marsh.csa.serviceagreement.Servi