Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> streaming Avro to HDFS


Copy link to this message
-
streaming Avro to HDFS
Hi I'm just getting started with Flume and trying to understand the flow of things.

I have avro binary data files being generated on remote nodes and I want to use
Flume (1.2.0) to stream them to my HDFS cluster at a central location. It seems I can
stream the data but the resulting files on HDFS seem corrupt.  Here's what I did:

For my "master" (on the NameNode of my Hadoop cluster)  I started this:
flume-ng agent -f agent.conf  -Dflume.root.logger=DEBUG,console -n agent
With this config:
agent.channels = memory-channel
agent.sources = avro-source
agent.sinks = hdfs-sink

agent.channels.memory-channel.type = memory
agent.channels.memory-channel.capacity = 1000
agent.channels.memory-channel.transactionCapacity = 100

agent.sources.avro-source.channels = memory-channel
agent.sources.avro-source.type = avro
agent.sources.avro-source.bind = 10.10.10.10
agent.sources.avro-source.port = 41414

agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.channel = memory-channel
agent.sinks.hdfs-sink.hdfs.path = hdfs://namenode1:9000/flume

On a remote node I streamed a test file like this:
flume-ng avro-client -H 10.10.10.10 -p 41414 -F /tmp/test.avro

I can see the master is writing to HDFS
  ......
  13/02/06 09:37:55 INFO hdfs.BucketWriter: Creating hdfs://namenode1:9000/flume/FlumeData.1360172273684.tmp
  13/02/06 09:38:25 INFO hdfs.BucketWriter: Renaming hdfs://namenode1:9000/flume/FlumeData.1360172273684.tmp
  to hdfs://namenode1:9000/flume/FlumeData.1360172273684

But the data doesn't seem right. The original file is 4551 bytes, the file written to
HDFS was only 219 bytes
  [localhost] $ ls -l FlumeData.1360172273684 /tmp/test.avro
  -rwxr-xr-x 1 amiller amiller  219 Feb  6 18:51 FlumeData.1360172273684
  -rwxr-xr-x 1 amiller amiller 4551 Feb 6 12:00 /tmp/test.avro

  [localhost] $ avro cat /tmp/test.avro
  {"system_model": null, "nfsv4": null, "ip": null, "site": null, "nfsv3": null, "export": null, "ifnet": [{"send_bps": 1234, "recv_bps": 5678, "name": "eth0"}, {"send_bps": 100, "recv_bps": 200, "name": "eth1"}, {"send_bps": 0, "recv_bps": 0, "name": "eth2"}], "disk": null, "hostname": "localhost", "total_mem": null, "ontapi_version": null, "serial_number": null, "cifs": null, "cpu_model": null, "volume": null, "time_stamp": 1357639723, "aggregate": null, "num_cpu": null, "cpu_speed_mhz": null, "hostid": null, "kernel_version": null, "qtree": null, "processor": null}

  [localhost] $ hadoop fs -copyToLocal /flume/FlumeData.1360172273684 .
  [localhost] $ avro cat FlumeData.1360172273684
  panic: ord() expected a character, but string of length 0 found

Alan
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB