Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> compression over-the-wire with 1.3.1 ?


Copy link to this message
-
Re: compression over-the-wire with 1.3.1 ?
Compression wasn't dropped on purpose, it just has not been implemented on
NG.
On Mon, Feb 25, 2013 at 7:29 AM, Langston, Jim
<[EMAIL PROTECTED]>wrote:

>  Thanks Brock,
>
>  It seems that the compression was dropped. Was that intended ? There
> was a lot of work done in the .94 version. I move a tremendous amount
> of data between clusters and use compression and batching.
>
>  https://issues.cloudera.org/browse/FLUME-559
>
>  looks like I will need to spend time working with both versions.
>
>  Jim
>
>
>
>   From: Brock Noland <[EMAIL PROTECTED]>
> Reply-To: <[EMAIL PROTECTED]>
> Date: Sat, 23 Feb 2013 13:31:40 -0600
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Subject: Re: compression over-the-wire with 1.3.1 ?
>
>  Hi,
>
>  The spool dir source does not have built in functionality to read
> compressed files. One could be built, but I think it would either require a
> subclass of
> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/serialization/EventDeserializer.javaor changes to
> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/client/avro/ReliableSpoolingFileEventReader.javaso that it recognized compressed files.
>
>  In regards to over the wire compression, there is an open review on that
> now:
>
>  https://reviews.apache.org/r/9427/
>
>  Brock
>
>
> On Sat, Feb 23, 2013 at 9:17 AM, Langston, Jim <[EMAIL PROTECTED]
> > wrote:
>
>>  Hi all,
>>
>>  A question on sending compressed files from a remote source
>> to HDFS.
>>
>>  I have been working with .94 of flume and gzip a file before sending
>> it from a remote location to a HDFS cluster. Works great.
>>
>>  Now, I'm looking to move to 1.2 or 1.3.1 (CDH4 installs 1.2 by
>> default through the management tool), but I don't see the
>> equivalent in 1.2 or 1.3.1. I found the reference to utilize the
>> new source in 1.3.1, spoolDir, but when I try to pick up a compressed
>> file in the spool directory I'm getting an error:
>>
>>  13/02/22 19:32:41 ERROR source.SpoolDirectorySource: Uncaught exception
>> in Runnable
>> org.apache.flume.ChannelException: Unable to put batch on required
>> channel: org.apache.flume.channel.MemoryChannel{name: c1}
>> at
>> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:200)
>> at
>> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:143)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> at java.lang.Thread.run(Thread.java:722)
>> Caused by: org.apache.flume.ChannelException: Space for commit to queue
>> couldn't be acquired Sinks are likely not keeping up with sources, or the
>> buffer size is too tight
>> at
>> org.apache.flume.channel.MemoryChannel$MemoryTransaction.doCommit(MemoryChannel.java:126)
>> at
>> org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
>> at
>> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:192)
>> ... 9 more
>>
>>
>>  I have tried to increase the buffer size but it did not change the
>> error. My current configuration file which
>> generated the error:
>>
>>  # Compressed Source
>> agent_compressed.sources = r1
>> agent_compressed.channels = c1
>> agent_compressed.channels.c1.type = memory
>> agent_compressed.sources.r1.type = spooldir
>> agent_compressed.sources.r1.bufferMaxLineLength = 50000

Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB