Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> streaming Avro to HDFS


Copy link to this message
-
streaming Avro to HDFS
Hi I'm just getting started with Flume and trying to understand the flow of things.

I have avro binary data files being generated on remote nodes and I want to use
Flume (1.2.0) to stream them to my HDFS cluster at a central location. It seems I can
stream the data but the resulting files on HDFS seem corrupt.  Here's what I did:

For my "master" (on the NameNode of my Hadoop cluster)  I started this:
flume-ng agent -f agent.conf  -Dflume.root.logger=DEBUG,console -n agent
With this config:
agent.channels = memory-channel
agent.sources = avro-source
agent.sinks = hdfs-sink

agent.channels.memory-channel.type = memory
agent.channels.memory-channel.capacity = 1000
agent.channels.memory-channel.transactionCapacity = 100

agent.sources.avro-source.channels = memory-channel
agent.sources.avro-source.type = avro
agent.sources.avro-source.bind = 10.10.10.10
agent.sources.avro-source.port = 41414

agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.channel = memory-channel
agent.sinks.hdfs-sink.hdfs.path = hdfs://namenode1:9000/flume

On a remote node I streamed a test file like this:
flume-ng avro-client -H 10.10.10.10 -p 41414 -F /tmp/test.avro

I can see the master is writing to HDFS
  ......
  13/02/06 09:37:55 INFO hdfs.BucketWriter: Creating hdfs://namenode1:9000/flume/FlumeData.1360172273684.tmp
  13/02/06 09:38:25 INFO hdfs.BucketWriter: Renaming hdfs://namenode1:9000/flume/FlumeData.1360172273684.tmp
  to hdfs://namenode1:9000/flume/FlumeData.1360172273684

But the data doesn't seem right. The original file is 4551 bytes, the file written to
HDFS was only 219 bytes
  [localhost] $ ls -l FlumeData.1360172273684 /tmp/test.avro
  -rwxr-xr-x 1 amiller amiller  219 Feb  6 18:51 FlumeData.1360172273684
  -rwxr-xr-x 1 amiller amiller 4551 Feb 6 12:00 /tmp/test.avro

  [localhost] $ avro cat /tmp/test.avro
  {"system_model": null, "nfsv4": null, "ip": null, "site": null, "nfsv3": null, "export": null, "ifnet": [{"send_bps": 1234, "recv_bps": 5678, "name": "eth0"}, {"send_bps": 100, "recv_bps": 200, "name": "eth1"}, {"send_bps": 0, "recv_bps": 0, "name": "eth2"}], "disk": null, "hostname": "localhost", "total_mem": null, "ontapi_version": null, "serial_number": null, "cifs": null, "cpu_model": null, "volume": null, "time_stamp": 1357639723, "aggregate": null, "num_cpu": null, "cpu_speed_mhz": null, "hostid": null, "kernel_version": null, "qtree": null, "processor": null}

  [localhost] $ hadoop fs -copyToLocal /flume/FlumeData.1360172273684 .
  [localhost] $ avro cat FlumeData.1360172273684
  panic: ord() expected a character, but string of length 0 found

Alan
+
Hari Shreedharan 2013-02-06, 18:15
+
Alan Miller 2013-02-06, 18:20
+
Hari Shreedharan 2013-02-06, 18:58
+
Alan Miller 2013-02-07, 13:44