|
|
-
streaming Avro to HDFSAlan Miller 2013-02-06, 17:58
Hi I'm just getting started with Flume and trying to understand the flow of things.
I have avro binary data files being generated on remote nodes and I want to use Flume (1.2.0) to stream them to my HDFS cluster at a central location. It seems I can stream the data but the resulting files on HDFS seem corrupt. Here's what I did: For my "master" (on the NameNode of my Hadoop cluster) I started this: flume-ng agent -f agent.conf -Dflume.root.logger=DEBUG,console -n agent With this config: agent.channels = memory-channel agent.sources = avro-source agent.sinks = hdfs-sink agent.channels.memory-channel.type = memory agent.channels.memory-channel.capacity = 1000 agent.channels.memory-channel.transactionCapacity = 100 agent.sources.avro-source.channels = memory-channel agent.sources.avro-source.type = avro agent.sources.avro-source.bind = 10.10.10.10 agent.sources.avro-source.port = 41414 agent.sinks.hdfs-sink.type = hdfs agent.sinks.hdfs-sink.channel = memory-channel agent.sinks.hdfs-sink.hdfs.path = hdfs://namenode1:9000/flume On a remote node I streamed a test file like this: flume-ng avro-client -H 10.10.10.10 -p 41414 -F /tmp/test.avro I can see the master is writing to HDFS ...... 13/02/06 09:37:55 INFO hdfs.BucketWriter: Creating hdfs://namenode1:9000/flume/FlumeData.1360172273684.tmp 13/02/06 09:38:25 INFO hdfs.BucketWriter: Renaming hdfs://namenode1:9000/flume/FlumeData.1360172273684.tmp to hdfs://namenode1:9000/flume/FlumeData.1360172273684 But the data doesn't seem right. The original file is 4551 bytes, the file written to HDFS was only 219 bytes [localhost] $ ls -l FlumeData.1360172273684 /tmp/test.avro -rwxr-xr-x 1 amiller amiller 219 Feb 6 18:51 FlumeData.1360172273684 -rwxr-xr-x 1 amiller amiller 4551 Feb 6 12:00 /tmp/test.avro [localhost] $ avro cat /tmp/test.avro {"system_model": null, "nfsv4": null, "ip": null, "site": null, "nfsv3": null, "export": null, "ifnet": [{"send_bps": 1234, "recv_bps": 5678, "name": "eth0"}, {"send_bps": 100, "recv_bps": 200, "name": "eth1"}, {"send_bps": 0, "recv_bps": 0, "name": "eth2"}], "disk": null, "hostname": "localhost", "total_mem": null, "ontapi_version": null, "serial_number": null, "cifs": null, "cpu_model": null, "volume": null, "time_stamp": 1357639723, "aggregate": null, "num_cpu": null, "cpu_speed_mhz": null, "hostid": null, "kernel_version": null, "qtree": null, "processor": null} [localhost] $ hadoop fs -copyToLocal /flume/FlumeData.1360172273684 . [localhost] $ avro cat FlumeData.1360172273684 panic: ord() expected a character, but string of length 0 found Alan +
Hari Shreedharan 2013-02-06, 18:15
+
Alan Miller 2013-02-06, 18:20
+
Hari Shreedharan 2013-02-06, 18:58
+
Alan Miller 2013-02-07, 13:44
|