Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Reduce-side joins in Avro M/R


+
Andrew Kenworthy 2011-12-06, 16:43
+
Scott Carey 2011-12-07, 18:40
+
Andrew Kenworthy 2011-12-13, 16:46
Copy link to this message
-
Re: Reduce-side joins in Avro M/R
Scott Carey 2012-01-05, 23:20
The overhead of checking the union is not that high, but it would be useful
to be able to specify a map of different Avro schemas to source paths for a
variety of use cases.  I am not sure to what extent that is possible with
the current Avro mapreduce API.

There are some folks working on making improved Avro mapreduce/mapred APIs
with the intention of eventually contributing it back to Avro.  You might
get some good ideas from there:
https://issues.apache.org/jira/browse/AVRO-593
https://github.com/wibidata/odiago-avro
On 12/13/11 8:46 AM, "Andrew Kenworthy" <[EMAIL PROTECTED]> wrote:

> I'm currently using a UNION-schema to map two different types of data (read
> from two different input paths) in my reducer to a common record. This works
> fine, but - if I have understood the mechanism correctly - it would mean that
> Avro is having to check each and every record against my UNION schema. With a
> "normal" reduce-side join, I could use MultipleInputs to specify a mapper for
> each input, thus letting them run independently (since each mapper knows its
> input) with presumably less overhead.
>
> Is it possible with Avro to avoid the overhead of checking each input row
> against the union schema?
>
> Thanks,
>
> Andrew
>
>>  
>>  
>>  
>>
>>   From: Scott Carey <[EMAIL PROTECTED]>
>>  To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Andrew Kenworthy
>> <[EMAIL PROTECTED]>
>>  Sent: Wednesday, December 7, 2011 7:40 PM
>>  Subject: Re: Reduce-side joins in Avro M/R
>>  
>> This should be conceptually the same as a normal map-reduce join of the same
>> type.  Avro handles the serialization, but not the map-reduce algorithm or
>> strategy.  
>>
>> On 12/6/11 8:43 AM, "Andrew Kenworthy" <[EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>>
>>> I'd like to use reduce-side joins in an avro M/R job, and am not sure how to
>>> do it: are there any best-practice tips or outlines of what one would have
>>> to implement in order to make this possible?
>>>
>>> Thanks,
>>>
>>> Andrew Kenworthy
>>
>>
>>  
>>  
>>  
>