Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> How to use LZO in Flume-ng


Copy link to this message
-
Re: How to use LZO in Flume-ng
'com.hadoop.compression.lzo.LzoCodec' is one of extension for
'org.apache.hadoop.io.compress.CompressionCodec'

2012/8/28 Denny Ye <[EMAIL PROTECTED]>

> hi Kevin,
>     I applied for LZO successfully. I will post my LZO configuration, you
> can compare the difference.
>
>     1. agent.sinks.hdfsSin1.hdfs.codeC > com.hadoop.compression.lzo.LzoCodec
>     2. Added this configuration at Hadoop core-site.xml
>        <property>
>             <name>io.compression.codecs</name>
>             <value>com.hadoop.compression.lzo.LzoCodec</value>
>        </property>
>
> -Regards
> Denny Ye
>
>
> 2012/8/28 Kevin Lee <[EMAIL PROTECTED]>
>
>> Folks,
>>
>> I was follow this link Hadoop at Twitter (part 1): Splittable LZO
>> Compression<http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/%5D> to
>> integration LZO in Hadoop2.0, but seems Flume-ng lzo compress not work.
>>
>> My flume-ng configuratioin file is:
>>
>> cat > /tmp/flume-lzo.conf <<EOF
>> agent.sources = lzo-avro-collect
>> agent.channels = lzo-memory-channel
>> agent.sinks = lzo-hdfs-write
>>
>> agent.sources.lzo-avro-collect.type = avro
>> agent.sources.lzo-avro-collect.bind = 0.0.0.0
>> agent.sources.lzo-avro-collect.port = 12345
>> agent.sources.lzo-avro-collect.channels = lzo-memory-channel
>> agent.channels.lzo-memory-channel.type = memory
>> agent.channels.lzo-memory-channel.capacity = 1000000
>> agent.channels.lzo-memory-channel.transactionCapacity = 10000
>> agent.channels.lzo-memory-channel.stay-alive = 3
>> agent.sinks.lzo-hdfs-write.type = hdfs
>> agent.sinks.lzo-hdfs-write.hdfs.path = hdfs://10.34.4.55:8020/tmp/
>> agent.sinks.lzo-hdfs-write.hdfs.filePrefix = test%Y
>> agent.sinks.lzo-hdfs-write.channel = lzo-memory-channel
>> agent.sinks.lzo-hdfs-write.hdfs.rollInterval = 3600
>> agent.sinks.lzo-hdfs-write.hdfs.rollSize = 209715200
>> agent.sinks.lzo-hdfs-write.hdfs.rollCount = 0
>> agent.sinks.lzo-hdfs-write.hdfs.batchSize = 1000
>> agent.sinks.lzo-hdfs-write.hdfs.codeC = lzo
>> agent.sinks.lzo-hdfs-write.hdfs.fileType = CompressedStream
>> EOF
>>
>> and i start flume-ng-agent on front
>>
>> sudo -u flume flume-ng agent -n agent -f /tmp/flume-lzo.conf
>>
>> using avro-client to ship the event.
>>
>> echo aaaaaaaaaaaaaaaaa > /tmp/events
>> sudo -u flume flume-ng avro-client -H localhost -p 12345 -F /tmp/events
>>
>> the flume-ng-agent collector log as follow:
>>
>> 12/08/28 06:33:53 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
>> 12/08/28 06:33:53 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 6bb1b7f8b9044d8df9b4d2b6641db7658aab3cf8]
>> 12/08/28 06:33:54 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
>> 12/08/28 06:33:54 INFO nodemanager.DefaultLogicalNodeManager: Starting new configuration:{ sourceRunners:{lzo-avro-collect=EventDrivenSourceRunner: { source:AvroSource: { bindAddress:0.0.0.0 port:12345 } }} sinkRunners:{lzo-hdfs-write=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@39e57e8f counterGroup:{ name:null counters:{} } }} channels:{lzo-memory-channel=org.apache.flume.channel.MemoryChannel@9d7fbfb} }
>> 12/08/28 06:33:54 INFO nodemanager.DefaultLogicalNodeManager: Starting Channel lzo-memory-channel
>> 12/08/28 06:33:54 INFO nodemanager.DefaultLogicalNodeManager: Starting Sink lzo-hdfs-write
>> 12/08/28 06:33:54 INFO nodemanager.DefaultLogicalNodeManager: Starting Source lzo-avro-collect
>> 12/08/28 06:33:54 INFO source.AvroSource: Avro source starting:AvroSource: { bindAddress:0.0.0.0 port:12345 }
>> 12/08/28 06:34:02 INFO ipc.NettyServer: [id: 0x651db6bb, /127.0.0.1:48085 => /127.0.0.1:12345] OPEN
>> 12/08/28 06:34:02 INFO ipc.NettyServer: [id: 0x651db6bb, /127.0.0.1:48085 => /127.0.0.1:12345] BOUND: /127.0.0.1:12345
>> 12/08/28 06:34:02 INFO ipc.NettyServer: [id: 0x651db6bb, /127.0.0.1:48085 => /127.0.0.1:12345] CONNECTED: /127.0.0.1:48085
>> 12/08/28 06:34:02 INFO ipc.NettyServer: [id: 0x651db6bb, /127.0.0.1:48085 :> /127.0.0.1:12345] DISCONNECTED
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB