|
|
+
Terry Healy 2012-12-20, 17:32
+
Russell Jurney 2012-12-22, 00:42
-
Re: Output from AVRO mapperTerry Healy 2012-12-22, 18:33
<html>
<head> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> Thanks Russell. This looks like a lot easier solution which I will look at more carefully in the near future. But at this point I don't want to walk away from the Java M/R solution just because I can't work it out. I know it works - I just am missing something basic in my understanding.<br> <br> -Terry<br> <br> P.S. Cool font too.<br> <br> <div class="moz-cite-prefix">On 12/21/12 7:42 PM, Russell Jurney wrote:<br> </div> <blockquote cite="mid:-5487704440750579907@unknownmsgid" type="cite"> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <div>I don't mean to harp, but this is a few lines in Pig:</div> <pre class="pig" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:10px;padding-right:10px;padding-bottom:10px;padding-left:10px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font:normal normal normal 1em/normal 'andale mono','lucida console',monospace;vertical-align:baseline;display:block;width:auto;clear:none;overflow-x:visible;overflow-y:visible"><span class="Apple-style-span" style="font-family:Noteworthy;font-size:18px;font-weight:bold;line-height:24px;white-space:normal"><div>/* Load Avro jars and define shortcut */</div><div>register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar�</div> <div>register /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar�</div><div>register /me/pig/contrib/piggybank/java/piggybank.jar�</div><div>define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();</div><div> </div><div>/* Load Avros */</div><div>input = load 'my.avro' using AvroStorage();</div><div> </div><div>/* Verify input */</div><div>describe input;</div><div>Illustrate input;</div><div> </div><div>/* Convert Avros to JSON */</div> <div>store input into 'my.json' using com.twitter.elephantbird.pig.store.JsonStorage();</div><div>store input into 'my.json.lzo' using�com.twitter.elephantbird.pig.store.LzoJsonStorage();</div><div> </div> <div>/* Convert simple Avros to TSV */</div><div>store input into 'my.tsv';</div><div> </div><div>/* Convert Avros to SequenceFiles */</div>REGISTER '/path/to/elephant-bird.jar';<div>�store �input into 'my.seq' using com.twitter.elephantbird.pig.store.SequenceFileStorage(</div> <div>� � /* example: */</div><div>� � '-c com.twitter.elephantbird.pig.util.IntWritableConverter',</div><div>� � '-c com.twitter.elephantbird.pig.util.TextConverter'</div><div>)<span class="Apple-style-span" style="">;</span></div> <div> </div><div>/* Convert Avros to Protobufs */</div><div>store input into 'input.protobuf� using com.twitter.elephantbird.examples.proto.pig.store.ProfileProtobufB64LinePigStorage();</div><div> </div><div>/* Convert Avros to a Lucene Index */</div> store input into 'input.lucene' using LuceneIndexStorage('com.example.MyPigLuceneIndexOutputFormat');</span><font class="Apple-style-span" face="Helvetica"><span class="Apple-style-span" style="white-space:normal"> </span></font></pre> <div><span class="Apple-style-span" style="font-family:Noteworthy;font-size:18px;font-weight:bold;line-height:24px;white-space:normal">There are also drivers for most NoSQLish databases...</span></div> <div><span class="Apple-style-span" style="font-family:Noteworthy;font-size:18px;font-weight:bold;line-height:24px;white-space:normal"><br> </span></div> <div>Russell Jurney <a moz-do-not-send="true" href="http://datasyndrome.com">http://datasyndrome.com</a></div> <div><br> On Dec 20, 2012, at 9:33 AM, Terry Healy <<a moz-do-not-send="true" href="mailto:[EMAIL PROTECTED]">[EMAIL PROTECTED]</a>> wrote:<br> <br> </div> <blockquote type="cite"> <div><span>I'm just getting started using AVRO within Map/Reduce and trying to</span><br> <span>convert some existing non-AVRO code to use AVRO input. So far the data</span><br> <span>that previously was stored in tab delimited files has been converted to</span><br> <span>.avro successfully as checked with avro-tools.</span><br> <span></span><br> <span>Where I'm getting hung up extending beyond my book-based examples is in</span><br> <span>attempting to read from AVRO (using generic records) where the mapper</span><br> <span>output is NOT in AVRO format. I can't seem to reconcile extending</span><br> <span>AvroMapper and NOT using AvroCollector.</span><br> <span></span><br> <span>Here are snippets of code that show my non-AVRO M/R code and my</span><br> <span>[failing] attempts to make this change. If anyone can help me along it</span><br> <span>would be very much appreciated.</span><br> <span></span><br> <span>-Terry</span><br> <span></span><br> <span><code></span><br> <span>Pre-Avro version: (Works fine with .tsv input format)</span><br> <span></span><br> <span> ���public static class HdFlowMapper extends MapReduceBase</span><br> <span> �����������implements Mapper<Text, HdFlowWritable, LongPair,</span><br> <span>HdFlowWritable> {</span><br> <span></span><br> <span></span><br> <span> �������@Override</span><br> <span> �������public void map(Text key, HdFlowWritable value,</span><br> <span> ���������������OutputCollector<LongPair, HdFlowWritable> outpu |