Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Events being cut by flume

Copy link to this message
Re: Events being cut by flume
The default value for the available memory specified in
$FLUME_HOME/bin/flume-ng is very small (20MB)

So, in your $FLUME_HOME/conf/flume-env.sh file

Try increasing your Java memory to a higher number (at most 50% of the
available RAM)
JAVA_OPTS="-Xms4096m -Xmx4096m -XX:MaxPermSize=4096m"

Then, in your agent configuration file:

Increase the maximum number of lines per event to a much higher number
(like 5000).

Also change the output encoding to UTF-8

Let's make sure that the input encoding matches the encoding of the
original event. This can cause problems if it is not the right one.

Let's see if these changes make a difference.
*Author and Instructor for the Upcoming Book and Lecture Series*
*Massive Log Data Aggregation, Processing, Searching and Visualization with
Open Source Software*
On 27 August 2013 11:13, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]> wrote:

>  Hi Israel,
>  thanks for your response. We already checked this, doing :set list with
> vi editor our events look like this:
>  "line1field1";"line1field2";"line1fieldN"*$*
> "lineNfield1";"lineNfield2";"lineNfieldN"*$*
>  There are not event delimiters*($)* between fields of an event.
> I have tried forcing the encoding(because I believe this files, that are
> generated by our customer, are converted from ascii to utf-8 by BOM and
> they could contain characters with more bytes that the expected one):
>  *agent.sources.rpb.inputCharset = UTF-16*
> *agent.sources.rpb.deserializer.maxLineLength = 250*
> *agent.sources.rpb.deserializer.outputCharset = UTF-16*
>  but if i use a *maxLineLenght* of this size(250) then lot of events are
> truncated(event the max characters per line are 250):
> *13/08/27 17:03:34 WARN serialization.LineDeserializer: Line length
> exceeds max (250), truncating line!*
>  if I take a look into the generated file, there are unrecognized
> chacarters: �� and events have been cut in a random way(there are lines
> with only 3 characters).
>  I have tried increasing the maxLineLenght parameter but I end getting a
> java heap space exception :(
>  Again, thanks. Any help will be very appreciated.
>  De: Israel Ekpo <[EMAIL PROTECTED]>
> Responder a: Flume User List <[EMAIL PROTECTED]>
> Fecha: martes, 27 de agosto de 2013 16:29
> Para: Flume User List <[EMAIL PROTECTED]>
> Asunto: Re: Events being cut by flume
>  Hello Zoraida,
>  What sources are you events coming from?
>  I have a feeling they are coming from SpoolingDirectory and the events
> contains newline characters (even delimiter).
>  If this is the case, you are going to see the events split up whenever
> the parser encounters the delimiter.
>   *Author and Instructor for the Upcoming Book and Lecture Series*
> *Massive Log Data Aggregation, Processing, Searching and Visualization
> with Open Source Software*
> *http://massivelogdata.com*
> On 27 August 2013 06:20, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]> wrote:
>>  Hello,
>>  I am having some weird problem while processing events coming from a
>> file with this format:
>> UTF-8 Unicode (with BOM) English text, with CRLF line terminators
>>  Some of the events in the file contain this text: "Marés". While some
>> events are sent correctly without begin cut by flume, there are others that
>> arrive incomplete. And even more, the process of sending more events (once
>> one event has been cut) stops. We end with incomplete files on HDFS. We
>> have isolate the problem: trying with roll file sink instead of HDFS ,
>> removing all the interceptors, etc. However, we still have the same
>> problem. Apparently, the troublesome event does not have any hide weird
>> character and files are generated automatically so we would expect that if
>> some malformed input comes from one event, it would come for the others
>> too.
>>  We really appreciate any hint that you could give us.
>>  Thanks.
>> ------------------------------
>> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar