Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # dev - MultipleInputs with AccumuloInputFormat

Copy link to this message
Re: MultipleInputs with AccumuloInputFormat
Christopher 2013-11-07, 21:58
Are there any other analogous InputFormats that use multiple static
methods in a stateless way to configure a job?

Christopher L Tubbs II
On Tue, Nov 5, 2013 at 12:15 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
> Heh, ok.
> I'm currently working through a bit of a prototype to see how it works.
> I'm not a mapred/mapreduce expert, but I *think* I have an approach that
> will work. Keep an eye out for a Jira -- would love feedback.
> On 11/5/13, 12:13 PM, Kevin Faro wrote:
>> I recently looked into that and came to the same realization.
>> I ended up writing a new input format that did the cartesian product of
>> two
>> tables.  But to do that I had to store values for the left configuration
>> and right configuration and then copy over whichever config settings I
>> wanted to use for the AIF depending on which split i needed in the
>> RecordReader.
>> It would have been awesome if I could have just used the MultipleInputs
>> ...
>> --Kevin
>> On Tue, Nov 5, 2013 at 10:24 AM, Josh Elser <[EMAIL PROTECTED]> wrote:
>>> In executing some MapReduce over Accumulo with the AccumuloInputFormat, I
>>> came to the realization that AIF fundamentally doesn't work with concepts
>>> like MultipleInputs in Hadoop (http://hadoop.apache.org/
>>> docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html).
>>> Given that you can only write one set of configuration for AIF into a
>>> Configuration object, there's not a mechanism to support multiple. This
>>> appears to be the case across all versions.
>>> Is this correct? Have I overlooked something?