Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Avro MapReduce (MR1): Prevent Key from being output by reducer when using Pair schema


Copy link to this message
-
Re: Avro MapReduce (MR1): Prevent Key from being output by reducer when using Pair schema
Thanks Ed! Can you also file an improvement JIRA under
https://issues.apache.org/jira/browse/AVRO with a patch that changes
it to make more sense?

On Thu, Jan 16, 2014 at 5:14 PM, ed <[EMAIL PROTECTED]> wrote:
> Hi Harsh,
>
> Thank you for your response which was invaluable in helping me to figure out
> my issue.  The Java-Doc is in fact incorrect when it states that
> AvroJob.setOutputSchema cannot accept non-Pair configs as it turns out it
> can.  What was throwing me off is that if you use AvroJob.setOutputSchema to
> set a non-Pair config, then you also need to call AvroJob.setMapOutputSchema
> (which does require the use of Pair).  Otherwise, by default, the map output
> schema gets set to whatever you set in setOutputSchema and if that is
> non-pair you'll get an error at runtime.
>
> Maybe the JavaDoc should say something along the lines of:
>
>> Configure a job's output schema. If this is a not a Pair-schema then you
>> must explicitly set the job's map output schema using setMapOutputSchema
>
>
> Thank you!
>
> Best Regards,
>
> Ed
>
>
>
>
> On Thu, Jan 16, 2014 at 6:47 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> Hello Ed,
>>
>> The AvroReducer per
>>
>> http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapred/AvroReducer.html
>> has a simple spec of <K,V,OUT>, where OUT can be any record type and
>> not necessarily a Pair<KO,VO> type.
>>
>> AvroJob.setOutputSchema(…) should accept non-pair configs. I think its
>> java-doc is incorrect though. I wrote a test case yesterday at
>> http://issues.apache.org/jira/browse/AVRO-1439, in which I set a
>> non-Pair schema via the same call without any trouble. We could get
>> the java-doc fixed, if it is indeed wrong.
>>
>> On Thu, Jan 16, 2014 at 2:14 PM, ed <[EMAIL PROTECTED]> wrote:
>> > Hello,
>> >
>> > I am currently reading in lots of small avro files and then writing them
>> > out
>> > into one large avro file using Map Reduce MR1.  I'm trying to do this
>> > using
>> > the AvroMapper and AvroReducer and it's almost working how I want.
>> >
>> > The problem right now is that it looks like I have to use
>> > "org.apache.avro.mapred.Pair" if I use "AvroJob.setOutputSchema".  Is
>> > there
>> > a way to output a Pair schema from AvroReducer and have the "key" in
>> > that
>> > schema be ignored (i.e., not included in the output from the reducer)?
>> > Right now when I check the Reducer output there is an added field in
>> > each
>> > record called "key" which I'd like to not have there.
>> >
>> > Essentially I'm looking for something like NullWritable where the key
>> > will
>> > just be ignored in the final output.
>> >
>> > Thank you for any assistance or guidance you can provide!
>> >
>> > Best Regards,
>> >
>> > Ed
>>
>>
>>
>> --
>> Harsh J
>
>

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB