Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Output from AVRO mapper


Copy link to this message
-
Re: Output from AVRO mapper
<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Thanks Russell. This looks like a lot easier solution which I will
    look at more carefully in the near future. But at this point I don't
    want to walk away from the Java M/R solution just because I can't
    work it out. I know it works - I just am missing something basic in
    my understanding.<br>
    <br>
    -Terry<br>
    <br>
    P.S. Cool font too.<br>
    <br>
    <div class="moz-cite-prefix">On 12/21/12 7:42 PM, Russell Jurney
      wrote:<br>
    </div>
    <blockquote cite="mid:-5487704440750579907@unknownmsgid" type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <div>I don't mean to harp, but this is a few lines in Pig:</div>
      <pre class="pig" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:10px;padding-right:10px;padding-bottom:10px;padding-left:10px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font:normal normal normal 1em/normal 'andale mono','lucida console',monospace;vertical-align:baseline;display:block;width:auto;clear:none;overflow-x:visible;overflow-y:visible"><span class="Apple-style-span" style="font-family:Noteworthy;font-size:18px;font-weight:bold;line-height:24px;white-space:normal"><div>/* Load Avro jars and define shortcut */</div><div>register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar�</div>
<div>register /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar�</div><div>register /me/pig/contrib/piggybank/java/piggybank.jar�</div><div>define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();</div><div>

</div><div>/* Load Avros */</div><div>input = load 'my.avro' using AvroStorage();</div><div>
</div><div>/* Verify input */</div><div>describe input;</div><div>Illustrate input;</div><div>
</div><div>/* Convert Avros to JSON */</div>
<div>store input into 'my.json' using com.twitter.elephantbird.pig.store.JsonStorage();</div><div>store input into 'my.json.lzo' using�com.twitter.elephantbird.pig.store.LzoJsonStorage();</div><div>
</div>
<div>/* Convert simple Avros to TSV */</div><div>store input into 'my.tsv';</div><div>
</div><div>/* Convert Avros to SequenceFiles */</div>REGISTER '/path/to/elephant-bird.jar';<div>�store �input into 'my.seq' using com.twitter.elephantbird.pig.store.SequenceFileStorage(</div>
<div>� � /* example: */</div><div>� � '-c com.twitter.elephantbird.pig.util.IntWritableConverter',</div><div>� � '-c com.twitter.elephantbird.pig.util.TextConverter'</div><div>)<span class="Apple-style-span" style="">;</span></div>
<div>
</div><div>/* Convert Avros to Protobufs */</div><div>store input into 'input.protobuf� using com.twitter.elephantbird.examples.proto.pig.store.ProfileProtobufB64LinePigStorage();</div><div>
</div><div>/* Convert Avros to a Lucene Index */</div>
store input into 'input.lucene' using LuceneIndexStorage('com.example.MyPigLuceneIndexOutputFormat');</span><font class="Apple-style-span" face="Helvetica"><span class="Apple-style-span" style="white-space:normal">

</span></font></pre>
      <div><span class="Apple-style-span"
style="font-family:Noteworthy;font-size:18px;font-weight:bold;line-height:24px;white-space:normal">There
          are also drivers for most NoSQLish databases...</span></div>
      <div><span class="Apple-style-span"
style="font-family:Noteworthy;font-size:18px;font-weight:bold;line-height:24px;white-space:normal"><br>
        </span></div>
      <div>Russell Jurney <a moz-do-not-send="true"
          href="http://datasyndrome.com">http://datasyndrome.com</a></div>
      <div><br>
        On Dec 20, 2012, at 9:33 AM, Terry Healy <<a
          moz-do-not-send="true" href="mailto:[EMAIL PROTECTED]">[EMAIL PROTECTED]</a>>
        wrote:<br>
        <br>
      </div>
      <blockquote type="cite">
        <div><span>I'm just getting started using AVRO within Map/Reduce
            and trying to</span><br>
          <span>convert some existing non-AVRO code to use AVRO input.
            So far the data</span><br>
          <span>that previously was stored in tab delimited files has
            been converted to</span><br>
          <span>.avro successfully as checked with avro-tools.</span><br>
          <span></span><br>
          <span>Where I'm getting hung up extending beyond my book-based
            examples is in</span><br>
          <span>attempting to read from AVRO (using generic records)
            where the mapper</span><br>
          <span>output is NOT in AVRO format. I can't seem to reconcile
            extending</span><br>
          <span>AvroMapper and NOT using AvroCollector.</span><br>
          <span></span><br>
          <span>Here are snippets of code that show my non-AVRO M/R code
            and my</span><br>
          <span>[failing] attempts to make this change. If anyone can
            help me along it</span><br>
          <span>would be very much appreciated.</span><br>
          <span></span><br>
          <span>-Terry</span><br>
          <span></span><br>
          <span><code></span><br>
          <span>Pre-Avro version: (Works fine with .tsv input format)</span><br>
          <span></span><br>
          <span> ���public static class HdFlowMapper extends
            MapReduceBase</span><br>
          <span> �����������implements Mapper<Text, HdFlowWritable,
            LongPair,</span><br>
          <span>HdFlowWritable> {</span><br>
          <span></span><br>
          <span></span><br>
          <span> �������@Override</span><br>
          <span> �������public void map(Text key, HdFlowWritable value,</span><br>
          <span> ���������������OutputCollector<LongPair,
            HdFlowWritable> outpu