Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Output from AVRO mapper


+
Terry Healy 2012-12-20, 17:32
Copy link to this message
-
Re: Output from AVRO mapper
Russell Jurney 2012-12-22, 00:42
I don't mean to harp, but this is a few lines in Pig:

/* Load Avro jars and define shortcut */
register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
register /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
register /me/pig/contrib/piggybank/java/piggybank.jar
define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();

/* Load Avros */
input = load 'my.avro' using AvroStorage();

/* Verify input */
describe input;
Illustrate input;

/* Convert Avros to JSON */
store input into 'my.json' using
com.twitter.elephantbird.pig.store.JsonStorage();
store input into 'my.json.lzo' using
com.twitter.elephantbird.pig.store.LzoJsonStorage();

/* Convert simple Avros to TSV */
store input into 'my.tsv';

/* Convert Avros to SequenceFiles */
REGISTER '/path/to/elephant-bird.jar';
 store  input into 'my.seq' using
com.twitter.elephantbird.pig.store.SequenceFileStorage(
    /* example: */
    '-c com.twitter.elephantbird.pig.util.IntWritableConverter',
    '-c com.twitter.elephantbird.pig.util.TextConverter'
);

/* Convert Avros to Protobufs */
store input into 'input.protobuf’ using
com.twitter.elephantbird.examples.proto.pig.store.ProfileProtobufB64LinePigStorage();

/* Convert Avros to a Lucene Index */
store input into 'input.lucene' using
LuceneIndexStorage('com.example.MyPigLuceneIndexOutputFormat');

There are also drivers for most NoSQLish databases...

Russell Jurney http://datasyndrome.com

On Dec 20, 2012, at 9:33 AM, Terry Healy <[EMAIL PROTECTED]> wrote:

I'm just getting started using AVRO within Map/Reduce and trying to
convert some existing non-AVRO code to use AVRO input. So far the data
that previously was stored in tab delimited files has been converted to
.avro successfully as checked with avro-tools.

Where I'm getting hung up extending beyond my book-based examples is in
attempting to read from AVRO (using generic records) where the mapper
output is NOT in AVRO format. I can't seem to reconcile extending
AvroMapper and NOT using AvroCollector.

Here are snippets of code that show my non-AVRO M/R code and my
[failing] attempts to make this change. If anyone can help me along it
would be very much appreciated.

-Terry

<code>
Pre-Avro version: (Works fine with .tsv input format)

   public static class HdFlowMapper extends MapReduceBase
           implements Mapper<Text, HdFlowWritable, LongPair,
HdFlowWritable> {
       @Override
       public void map(Text key, HdFlowWritable value,
               OutputCollector<LongPair, HdFlowWritable> output,
               Reporter reporter) throws IOException {

       ...//
               outKey = new LongPair(value.getSrcIp(), value.getFirst());

               HdFlowWritable outValue = value; // pass it all through
               output.collect(outKey, outValue);
   }

AVRO attempt:
       conf.setOutputFormat(TextOutputFormat.class);
       conf.setOutputKeyClass(LongPair.class);
       conf.setOutputValueClass(AvroFlowWritable.class);

       SCHEMA = new Schema.Parser().parse(NetflowSchema);
       AvroJob.setInputSchema(conf, SCHEMA);
       //AvroJob.setOutputSchema(conf, SCHEMA);
       AvroJob.setMapperClass(conf, AvroFlowMapper.class);
       AvroJob.setReducerClass(conf, AvroFlowReducer.class);

....

        public static class AvroFlowMapper<K> extends AvroMapper<K,
OutputCollector> {
       @Override
   ** IDE: "Method does not override or implement a method from a supertype"

       public void map(K datum, OutputCollector<LongPair,
AvroFlowWritable> collector, Reporter reporter) throws IOException {
           GenericRecord record = (GenericRecord) datum;
           afw = new AvroFlowWritable(record);
       // ...
           collector.collect(outKey, afw);
}

</code>
+
Terry Healy 2012-12-22, 18:33