Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Events being cut by flume


+
ZORAIDA HIDALGO SANCHEZ 2013-08-27, 10:20
+
Israel Ekpo 2013-08-27, 14:29
Copy link to this message
-
Re: Events being cut by flume
Hi Israel,

thanks for your response. We already checked this, doing :set list with vi editor our events look like this:

"line1field1";"line1field2";"line1fieldN"$
"lineNfield1";"lineNfield2";"lineNfieldN"$

There are not event delimiters($) between fields of an event.
I have tried forcing the encoding(because I believe this files, that are generated by our customer, are converted from ascii to utf-8 by BOM and they could contain characters with more bytes that the expected one):

agent.sources.rpb.inputCharset = UTF-16
agent.sources.rpb.deserializer.maxLineLength = 250
agent.sources.rpb.deserializer.outputCharset = UTF-16

but if i use a maxLineLenght of this size(250) then lot of events are truncated(event the max characters per line are 250):
13/08/27 17:03:34 WARN serialization.LineDeserializer: Line length exceeds max (250), truncating line!

if I take a look into the generated file, there are unrecognized chacarters: �� and events have been cut in a random way(there are lines with only 3 characters).

I have tried increasing the maxLineLenght parameter but I end getting a java heap space exception :(

Again, thanks. Any help will be very appreciated.

De: Israel Ekpo <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Responder a: Flume User List <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Fecha: martes, 27 de agosto de 2013 16:29
Para: Flume User List <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Asunto: Re: Events being cut by flume

Hello Zoraida,

What sources are you events coming from?

I have a feeling they are coming from SpoolingDirectory and the events contains newline characters (even delimiter).

If this is the case, you are going to see the events split up whenever the parser encounters the delimiter.
Author and Instructor for the Upcoming Book and Lecture Series
Massive Log Data Aggregation, Processing, Searching and Visualization with Open Source Software
http://massivelogdata.com
On 27 August 2013 06:20, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

Hello,

I am having some weird problem while processing events coming from a file with this format:
UTF-8 Unicode (with BOM) English text, with CRLF line terminators

Some of the events in the file contain this text: "Marés". While some events are sent correctly without begin cut by flume, there are others that arrive incomplete. And even more, the process of sending more events (once one event has been cut) stops. We end with incomplete files on HDFS. We have isolate the problem: trying with roll file sink instead of HDFS , removing all the interceptors, etc. However, we still have the same problem. Apparently, the troublesome event does not have any hide weird character and files are generated automatically so we would expect that if some malformed input comes from one event, it would come for the others too.

We really appreciate any hint that you could give us.

Thanks.

________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.
This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx
________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.
This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx
+
Israel Ekpo 2013-08-27, 15:53
+
ZORAIDA HIDALGO SANCHEZ 2013-09-05, 15:34
+
ZORAIDA HIDALGO SANCHEZ 2013-09-05, 15:57
+
ZORAIDA HIDALGO SANCHEZ 2013-09-08, 17:52
+
ZORAIDA HIDALGO SANCHEZ 2013-09-08, 19:05
+
ZORAIDA HIDALGO SANCHEZ 2013-09-09, 12:08
+
ZORAIDA HIDALGO SANCHEZ 2013-08-27, 12:58