|
|
+
Andrew Kenworthy 2011-12-06, 16:43
+
Scott Carey 2011-12-07, 18:40
+
Andrew Kenworthy 2011-12-13, 16:46
-
Re: Reduce-side joins in Avro M/RScott Carey 2012-01-05, 23:20
The overhead of checking the union is not that high, but it would be useful
to be able to specify a map of different Avro schemas to source paths for a variety of use cases. I am not sure to what extent that is possible with the current Avro mapreduce API. There are some folks working on making improved Avro mapreduce/mapred APIs with the intention of eventually contributing it back to Avro. You might get some good ideas from there: https://issues.apache.org/jira/browse/AVRO-593 https://github.com/wibidata/odiago-avro On 12/13/11 8:46 AM, "Andrew Kenworthy" <[EMAIL PROTECTED]> wrote: > I'm currently using a UNION-schema to map two different types of data (read > from two different input paths) in my reducer to a common record. This works > fine, but - if I have understood the mechanism correctly - it would mean that > Avro is having to check each and every record against my UNION schema. With a > "normal" reduce-side join, I could use MultipleInputs to specify a mapper for > each input, thus letting them run independently (since each mapper knows its > input) with presumably less overhead. > > Is it possible with Avro to avoid the overhead of checking each input row > against the union schema? > > Thanks, > > Andrew > >> >> >> >> >> From: Scott Carey <[EMAIL PROTECTED]> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Andrew Kenworthy >> <[EMAIL PROTECTED]> >> Sent: Wednesday, December 7, 2011 7:40 PM >> Subject: Re: Reduce-side joins in Avro M/R >> >> This should be conceptually the same as a normal map-reduce join of the same >> type. Avro handles the serialization, but not the map-reduce algorithm or >> strategy. >> >> On 12/6/11 8:43 AM, "Andrew Kenworthy" <[EMAIL PROTECTED]> wrote: >> >>> Hi, >>> >>> I'd like to use reduce-side joins in an avro M/R job, and am not sure how to >>> do it: are there any best-practice tips or outlines of what one would have >>> to implement in order to make this possible? >>> >>> Thanks, >>> >>> Andrew Kenworthy >> >> >> >> >> > |