Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Output from AVRO mapper


Copy link to this message
-
Re: Output from AVRO mapper
I don't mean to harp, but this is a few lines in Pig:

/* Load Avro jars and define shortcut */
register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
register /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
register /me/pig/contrib/piggybank/java/piggybank.jar
define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();

/* Load Avros */
input = load 'my.avro' using AvroStorage();

/* Verify input */
describe input;
Illustrate input;

/* Convert Avros to JSON */
store input into 'my.json' using
com.twitter.elephantbird.pig.store.JsonStorage();
store input into 'my.json.lzo' using
com.twitter.elephantbird.pig.store.LzoJsonStorage();

/* Convert simple Avros to TSV */
store input into 'my.tsv';

/* Convert Avros to SequenceFiles */
REGISTER '/path/to/elephant-bird.jar';
 store  input into 'my.seq' using
com.twitter.elephantbird.pig.store.SequenceFileStorage(
    /* example: */
    '-c com.twitter.elephantbird.pig.util.IntWritableConverter',
    '-c com.twitter.elephantbird.pig.util.TextConverter'
);

/* Convert Avros to Protobufs */
store input into 'input.protobuf’ using
com.twitter.elephantbird.examples.proto.pig.store.ProfileProtobufB64LinePigStorage();

/* Convert Avros to a Lucene Index */
store input into 'input.lucene' using
LuceneIndexStorage('com.example.MyPigLuceneIndexOutputFormat');

There are also drivers for most NoSQLish databases...

Russell Jurney http://datasyndrome.com

On Dec 20, 2012, at 9:33 AM, Terry Healy <[EMAIL PROTECTED]> wrote:

I'm just getting started using AVRO within Map/Reduce and trying to
convert some existing non-AVRO code to use AVRO input. So far the data
that previously was stored in tab delimited files has been converted to
.avro successfully as checked with avro-tools.

Where I'm getting hung up extending beyond my book-based examples is in
attempting to read from AVRO (using generic records) where the mapper
output is NOT in AVRO format. I can't seem to reconcile extending
AvroMapper and NOT using AvroCollector.

Here are snippets of code that show my non-AVRO M/R code and my
[failing] attempts to make this change. If anyone can help me along it
would be very much appreciated.

-Terry

<code>
Pre-Avro version: (Works fine with .tsv input format)

   public static class HdFlowMapper extends MapReduceBase
           implements Mapper<Text, HdFlowWritable, LongPair,
HdFlowWritable> {
       @Override
       public void map(Text key, HdFlowWritable value,
               OutputCollector<LongPair, HdFlowWritable> output,
               Reporter reporter) throws IOException {

       ...//
               outKey = new LongPair(value.getSrcIp(), value.getFirst());

               HdFlowWritable outValue = value; // pass it all through
               output.collect(outKey, outValue);
   }

AVRO attempt:
       conf.setOutputFormat(TextOutputFormat.class);
       conf.setOutputKeyClass(LongPair.class);
       conf.setOutputValueClass(AvroFlowWritable.class);

       SCHEMA = new Schema.Parser().parse(NetflowSchema);
       AvroJob.setInputSchema(conf, SCHEMA);
       //AvroJob.setOutputSchema(conf, SCHEMA);
       AvroJob.setMapperClass(conf, AvroFlowMapper.class);
       AvroJob.setReducerClass(conf, AvroFlowReducer.class);

....

        public static class AvroFlowMapper<K> extends AvroMapper<K,
OutputCollector> {
       @Override
   ** IDE: "Method does not override or implement a method from a supertype"

       public void map(K datum, OutputCollector<LongPair,
AvroFlowWritable> collector, Reporter reporter) throws IOException {
           GenericRecord record = (GenericRecord) datum;
           afw = new AvroFlowWritable(record);
       // ...
           collector.collect(outKey, afw);
}

</code>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB