Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> using get/setMeta() & seek


Copy link to this message
-
using get/setMeta() & seek
Hi,

Here's my scenario.

One Hadoop job collects incoming Flume data and keeps appending
records to Avro files. Every 30 minutes the file just grows. Another
Hadoop job runs every hour and reads the above files. When this job
finishes I want to keep track of where in the file (offset) it left off so
that
the next iteration can immediately seek to that position.

Can I use the DataFileWriter's  *setMeta*(String key, long value)
method to update a meta field with the position and use the DataFileReader's
*getMeta*(String key, long value) & *seek*(long position) methods
to implement this?

Is that reasonable? Currently I'm only using the Java API.
Are these methods implemented in the Ruby too?

Thanks,
Alan
+
Doug Cutting 2013-09-30, 21:16
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB