Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Output from AVRO mapper


Copy link to this message
-
Re: Output from AVRO mapper
<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Thanks Russell. This looks like a lot easier solution which I will
    look at more carefully in the near future. But at this point I don't
    want to walk away from the Java M/R solution just because I can't
    work it out. I know it works - I just am missing something basic in
    my understanding.<br>
    <br>
    -Terry<br>
    <br>
    P.S. Cool font too.<br>
    <br>
    <div class="moz-cite-prefix">On 12/21/12 7:42 PM, Russell Jurney
      wrote:<br>
    </div>
    <blockquote cite="mid:-5487704440750579907@unknownmsgid" type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <div>I don't mean to harp, but this is a few lines in Pig:</div>
      <pre class="pig" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:10px;padding-right:10px;padding-bottom:10px;padding-left:10px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font:normal normal normal 1em/normal 'andale mono','lucida console',monospace;vertical-align:baseline;display:block;width:auto;clear:none;overflow-x:visible;overflow-y:visible"><span class="Apple-style-span" style="font-family:Noteworthy;font-size:18px;font-weight:bold;line-height:24px;white-space:normal"><div>/* Load Avro jars and define shortcut */</div><div>register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar�</div>
<div>register /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar�</div><div>register /me/pig/contrib/piggybank/java/piggybank.jar�</div><div>define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();</div><div>

</div><div>/* Load Avros */</div><div>input = load 'my.avro' using AvroStorage();</div><div>
</div><div>/* Verify input */</div><div>describe input;</div><div>Illustrate input;</div><div>
</div><div>/* Convert Avros to JSON */</div>
<div>store input into 'my.json' using com.twitter.elephantbird.pig.store.JsonStorage();</div><div>store input into 'my.json.lzo' using�com.twitter.elephantbird.pig.store.LzoJsonStorage();</div><div>
</div>
<div>/* Convert simple Avros to TSV */</div><div>store input into 'my.tsv';</div><div>
</div><div>/* Convert Avros to SequenceFiles */</div>REGISTER '/path/to/elephant-bird.jar';<div>�store �input into 'my.seq' using com.twitter.elephantbird.pig.store.SequenceFileStorage(</div>
<div>� � /* example: */</div><div>� � '-c com.twitter.elephantbird.pig.util.IntWritableConverter',</div><div>� � '-c com.twitter.elephantbird.pig.util.TextConverter'</div><div>)<span class="Apple-style-span" style="">;</span></div>
<div>
</div><div>/* Convert Avros to Protobufs */</div><div>store input into 'input.protobuf� using com.twitter.elephantbird.examples.proto.pig.store.ProfileProtobufB64LinePigStorage();</div><div>
</div><div>/* Convert Avros to a Lucene Index */</div>
store input into 'input.lucene' using LuceneIndexStorage('com.example.MyPigLuceneIndexOutputFormat');</span><font class="Apple-style-span" face="Helvetica"><span class="Apple-style-span" style="white-space:normal">

</span></font></pre>
      <div><span class="Apple-style-span"
style="font-family:Noteworthy;font-size:18px;font-weight:bold;line-height:24px;white-space:normal">There
          are also drivers for most NoSQLish databases...</span></div>
      <div><span class="Apple-style-span"
style="font-family:Noteworthy;font-size:18px;font-weight:bold;line-height:24px;white-space:normal"><br>
        </span></div>
      <div>Russell Jurney <a moz-do-not-send="true"
          href="http://datasyndrome.com">http://datasyndrome.com</a></div>
      <div><br>
        On Dec 20, 2012, at 9:33 AM, Terry Healy <<a
          moz-do-not-send="true" href="mailto:[EMAIL PROTECTED]">[EMAIL PROTECTED]</a>>
        wrote:<br>
        <br>
      </div>
      <blockquote type="cite">
        <div><span>I'm just getting started using AVRO within Map/Reduce
            and trying to</span><br>
          <span>convert some existing non-AVRO code to use AVRO input.
            So far the data</span><br>
          <span>that previously was stored in tab delimited files has
            been converted to</span><br>
          <span>.avro successfully as checked with avro-tools.</span><br>
          <span></span><br>
          <span>Where I'm getting hung up extending beyond my book-based
            examples is in</span><br>
          <span>attempting to read from AVRO (using generic records)
            where the mapper</span><br>
          <span>output is NOT in AVRO format. I can't seem to reconcile
            extending</span><br>
          <span>AvroMapper and NOT using AvroCollector.</span><br>
          <span></span><br>
          <span>Here are snippets of code that show my non-AVRO M/R code
            and my</span><br>
          <span>[failing] attempts to make this change. If anyone can
            help me along it</span><br>
          <span>would be very much appreciated.</span><br>
          <span></span><br>
          <span>-Terry</span><br>
          <span></span><br>
          <span><code></span><br>
          <span>Pre-Avro version: (Works fine with .tsv input format)</span><br>
          <span></span><br>
          <span> ���public static class HdFlowMapper extends
            MapReduceBase</span><br>
          <span> �����������implements Mapper<Text, HdFlowWritable,
            LongPair,</span><br>
          <span>HdFlowWritable> {</span><br>
          <span></span><br>
          <span></span><br>
          <span> �������@Override</span><br>
          <span> �������public void map(Text key, HdFlowWritable value,</span><br>
          <span> ���������������OutputCollector<LongPair,
            HdFlowWritable> outpu
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB