-Re: Using Hadoop's MulitpleInputs with AccumuloInputFormat in a MR job
Aaron 2013-09-17, 02:10
Sorry about that, I should have clarified better. My original question did
involve scanning one table. Our particular use case is that we ingest a
number of txt files into one table (not to say we couldn't do multiple, we
just went with one for now). After our ingest runs, we run some MR jobs on
that table. One idea we had was to try and use Multiple Mappers (to do
some simple joins between rows) on this table for some later on processing.
As part of that MR job, we wanted to add some Iterators to the scans, cut
down on the records returned prior to reducing.
I need to look into how AccumuloInputFormat works, haven't done that
yet...so take everything I say as just a stream of thoughts. I wonder if
one way to look at this is to have AccumuloInputFormat "hold multiple
scanners." Somehow linking RecordReaders to Scanners. Need to think that
through more, but, mimic MulitpleInputs from
Hadoop....MultipleAccumuoInputs..i need to look at the patches in
On Mon, Sep 16, 2013 at 9:06 PM, Corey Nolet <[EMAIL PROTECTED]> wrote:
> Adding to my previous response- when you say you are setting different
> iterators on a scan are you referring to a single table with different
> iterators? Are the sets of iterators tied to different ranges? The changes
> we are making to the current InputFormat will still not allow different
> iterators on a single table but the use case sounds interesting.
> On Mon, Sep 16, 2013 at 3:55 PM, Corey Nolet <[EMAIL PROTECTED]> wrote:
>> We are currently re-working the AccumuloInputFormat for Accumulo 1.6 to
>> provide inputs from multiple tables (each with their own set of configured
>> iterators, ranges, columns). Check out ACCUMULO-391.
>> On Mon, Sep 16, 2013 at 11:41 AM, Aaron <[EMAIL PROTECTED]> wrote:
>>> I was curious if this is possible (i am thinking it isn't): from the
>>> Java API, Accumulo 1.5, Hadoop 1.2.1
>>> Want to set 2 different iterators on a scan, and send those results to 2
>>> different Mappers.
>>> So, how'd i do this with files as inputs, is just to use MultipleInputs
>>> class, with 2 different Path, and 2 different Mapper Classes, maybe the
>>> same InputFormat (e.g Text or Sequence)
>>> Since I'm using AccumulInputFormat, I would think I'd be ok..maybe with
>>> a null Path in the MulitpleInputs.addInputPath(), but it's the static
>>> addIterator() on the AccumuloInputFormat that I think is where I lose.
>>> Can I have 2 different AccumuloInputFormats, with different iterators?
>>> I think the answer is no, and briefly looking at the source, believe that
>>> to be correct..but, was curious if others have done have done something.