Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> MultipleInputs with AccumuloInputFormat


Copy link to this message
-
Re: MultipleInputs with AccumuloInputFormat
Are there any other analogous InputFormats that use multiple static
methods in a stateless way to configure a job?

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
On Tue, Nov 5, 2013 at 12:15 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
> Heh, ok.
>
> I'm currently working through a bit of a prototype to see how it works.
>
> I'm not a mapred/mapreduce expert, but I *think* I have an approach that
> will work. Keep an eye out for a Jira -- would love feedback.
>
>
> On 11/5/13, 12:13 PM, Kevin Faro wrote:
>>
>> I recently looked into that and came to the same realization.
>>
>> I ended up writing a new input format that did the cartesian product of
>> two
>> tables.  But to do that I had to store values for the left configuration
>> and right configuration and then copy over whichever config settings I
>> wanted to use for the AIF depending on which split i needed in the
>> RecordReader.
>>
>> It would have been awesome if I could have just used the MultipleInputs
>> ...
>>
>> --Kevin
>>
>>
>> On Tue, Nov 5, 2013 at 10:24 AM, Josh Elser <[EMAIL PROTECTED]> wrote:
>>
>>> In executing some MapReduce over Accumulo with the AccumuloInputFormat, I
>>> came to the realization that AIF fundamentally doesn't work with concepts
>>> like MultipleInputs in Hadoop (http://hadoop.apache.org/
>>>
>>> docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html).
>>> Given that you can only write one set of configuration for AIF into a
>>> Configuration object, there's not a mechanism to support multiple. This
>>> appears to be the case across all versions.
>>>
>>> Is this correct? Have I overlooked something?
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB