|
|
-
Re: How to use LZO in Flume-ngDenny Ye 2012-08-28, 07:28
'com.hadoop.compression.lzo.LzoCodec' is one of extension for
'org.apache.hadoop.io.compress.CompressionCodec' 2012/8/28 Denny Ye <[EMAIL PROTECTED]> > hi Kevin, > I applied for LZO successfully. I will post my LZO configuration, you > can compare the difference. > > 1. agent.sinks.hdfsSin1.hdfs.codeC > com.hadoop.compression.lzo.LzoCodec > 2. Added this configuration at Hadoop core-site.xml > <property> > <name>io.compression.codecs</name> > <value>com.hadoop.compression.lzo.LzoCodec</value> > </property> > > -Regards > Denny Ye > > > 2012/8/28 Kevin Lee <[EMAIL PROTECTED]> > >> Folks, >> >> I was follow this link Hadoop at Twitter (part 1): Splittable LZO >> Compression<http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/%5D> to >> integration LZO in Hadoop2.0, but seems Flume-ng lzo compress not work. >> >> My flume-ng configuratioin file is: >> >> cat > /tmp/flume-lzo.conf <<EOF >> agent.sources = lzo-avro-collect >> agent.channels = lzo-memory-channel >> agent.sinks = lzo-hdfs-write >> >> agent.sources.lzo-avro-collect.type = avro >> agent.sources.lzo-avro-collect.bind = 0.0.0.0 >> agent.sources.lzo-avro-collect.port = 12345 >> agent.sources.lzo-avro-collect.channels = lzo-memory-channel >> agent.channels.lzo-memory-channel.type = memory >> agent.channels.lzo-memory-channel.capacity = 1000000 >> agent.channels.lzo-memory-channel.transactionCapacity = 10000 >> agent.channels.lzo-memory-channel.stay-alive = 3 >> agent.sinks.lzo-hdfs-write.type = hdfs >> agent.sinks.lzo-hdfs-write.hdfs.path = hdfs://10.34.4.55:8020/tmp/ >> agent.sinks.lzo-hdfs-write.hdfs.filePrefix = test%Y >> agent.sinks.lzo-hdfs-write.channel = lzo-memory-channel >> agent.sinks.lzo-hdfs-write.hdfs.rollInterval = 3600 >> agent.sinks.lzo-hdfs-write.hdfs.rollSize = 209715200 >> agent.sinks.lzo-hdfs-write.hdfs.rollCount = 0 >> agent.sinks.lzo-hdfs-write.hdfs.batchSize = 1000 >> agent.sinks.lzo-hdfs-write.hdfs.codeC = lzo >> agent.sinks.lzo-hdfs-write.hdfs.fileType = CompressedStream >> EOF >> >> and i start flume-ng-agent on front >> >> sudo -u flume flume-ng agent -n agent -f /tmp/flume-lzo.conf >> >> using avro-client to ship the event. >> >> echo aaaaaaaaaaaaaaaaa > /tmp/events >> sudo -u flume flume-ng avro-client -H localhost -p 12345 -F /tmp/events >> >> the flume-ng-agent collector log as follow: >> >> 12/08/28 06:33:53 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library >> 12/08/28 06:33:53 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 6bb1b7f8b9044d8df9b4d2b6641db7658aab3cf8] >> 12/08/28 06:33:54 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false >> 12/08/28 06:33:54 INFO nodemanager.DefaultLogicalNodeManager: Starting new configuration:{ sourceRunners:{lzo-avro-collect=EventDrivenSourceRunner: { source:AvroSource: { bindAddress:0.0.0.0 port:12345 } }} sinkRunners:{lzo-hdfs-write=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@39e57e8f counterGroup:{ name:null counters:{} } }} channels:{lzo-memory-channel=org.apache.flume.channel.MemoryChannel@9d7fbfb} } >> 12/08/28 06:33:54 INFO nodemanager.DefaultLogicalNodeManager: Starting Channel lzo-memory-channel >> 12/08/28 06:33:54 INFO nodemanager.DefaultLogicalNodeManager: Starting Sink lzo-hdfs-write >> 12/08/28 06:33:54 INFO nodemanager.DefaultLogicalNodeManager: Starting Source lzo-avro-collect >> 12/08/28 06:33:54 INFO source.AvroSource: Avro source starting:AvroSource: { bindAddress:0.0.0.0 port:12345 } >> 12/08/28 06:34:02 INFO ipc.NettyServer: [id: 0x651db6bb, /127.0.0.1:48085 => /127.0.0.1:12345] OPEN >> 12/08/28 06:34:02 INFO ipc.NettyServer: [id: 0x651db6bb, /127.0.0.1:48085 => /127.0.0.1:12345] BOUND: /127.0.0.1:12345 >> 12/08/28 06:34:02 INFO ipc.NettyServer: [id: 0x651db6bb, /127.0.0.1:48085 => /127.0.0.1:12345] CONNECTED: /127.0.0.1:48085 >> 12/08/28 06:34:02 INFO ipc.NettyServer: [id: 0x651db6bb, /127.0.0.1:48085 :> /127.0.0.1:12345] DISCONNECTED |