|
|
-
Multiple Input for Avro jobs
Serge Blazhievsky 2012-02-08, 20:24
Hi all,
I am trying to assign different mapper to different folders.
Is there an equivalent of Multiinputs for avro MultipleInputs.addInputPath(job, new Path(input), AvroInputFormat<GenericRecord>.class, MapImpl.class); Thanks Serge
-
Re: Multiple Input for Avro jobs
Scott Carey 2012-02-08, 21:26
If you are after only multiple paths, path globs work. For example to read both /logs/2012_01 and /logs/2012_02 use the glob path: /logs/2012_0{1,2}
And to read the four paths /logs/2011_01, /logs/2011_02/, logs/2012_01, and /logs/2012_02 /logs/201{1,2}_0{1,2}
'*' is a wildcard as well, e.g. /logs/2011_*/ If you need a mapper instance per directory or different split assignment there would be more work involved.
On 2/8/12 12:24 PM, "Serge Blazhievsky" <[EMAIL PROTECTED]> wrote:
> Hi all, > > I am trying to assign different mapper to different folders. > > Is there an equivalent of Multiinputs for avro > > > MultipleInputs.addInputPath(job, new Path(input), > AvroInputFormat<GenericRecord>.class, MapImpl.class); > > > Thanks > Serge >
-
Re: Multiple Input for Avro jobs
Serge Blazhievsky 2012-02-08, 22:45
Thanks for replay, Scott!!
I need to assign different mapper instances for each directory.
Something similar to MultiInput.addPath
Any suggestions? Thanks Serge
On Wed, Feb 8, 2012 at 1:26 PM, Scott Carey <[EMAIL PROTECTED]> wrote:
> If you are after only multiple paths, path globs work. > For example to read both /logs/2012_01 and /logs/2012_02 use the glob > path: > /logs/2012_0{1,2} > > And to read the four paths /logs/2011_01, /logs/2011_02/, logs/2012_01, > and /logs/2012_02 > /logs/201{1,2}_0{1,2} > > '*' is a wildcard as well, e.g. /logs/2011_*/ > > > If you need a mapper instance per directory or different split assignment > there would be more work involved. > > On 2/8/12 12:24 PM, "Serge Blazhievsky" <[EMAIL PROTECTED]> wrote: > > Hi all, > > I am trying to assign different mapper to different folders. > > Is there an equivalent of Multiinputs for avro > > > MultipleInputs.addInputPath(job, new Path(input), > AvroInputFormat<GenericRecord>.class, MapImpl.class); > > > Thanks > Serge > > >
-
Re: Multiple Input for Avro jobs
Scott Carey 2012-02-08, 22:53
Unfortunately, I am not familiar with how MultiInput works. You may be able to compose it with AvroInputFormat within your own InputFormat to get the required results, but someone with more hadoop input experience would know more.
On 2/8/12 2:45 PM, "Serge Blazhievsky" <[EMAIL PROTECTED]> wrote:
> Thanks for replay, Scott!! > > I need to assign different mapper instances for each directory. > > Something similar to MultiInput.addPath > > Any suggestions? > > > Thanks > Serge > > > > On Wed, Feb 8, 2012 at 1:26 PM, Scott Carey <[EMAIL PROTECTED]> wrote: >> If you are after only multiple paths, path globs work. >> For example to read both /logs/2012_01 and /logs/2012_02 use the glob path: >> /logs/2012_0{1,2} >> >> And to read the four paths /logs/2011_01, /logs/2011_02/, logs/2012_01, and >> /logs/2012_02 >> /logs/201{1,2}_0{1,2} >> >> '*' is a wildcard as well, e.g. /logs/2011_*/ >> >> >> If you need a mapper instance per directory or different split assignment >> there would be more work involved. >> >> On 2/8/12 12:24 PM, "Serge Blazhievsky" <[EMAIL PROTECTED]> wrote: >> >>> Hi all, >>> >>> I am trying to assign different mapper to different folders. >>> >>> Is there an equivalent of Multiinputs for avro >>> >>> >>> MultipleInputs.addInputPath(job, new Path(input), >>> AvroInputFormat<GenericRecord>.class, MapImpl.class); >>> >>> >>> Thanks >>> Serge >>> >
-
Re: Multiple Input for Avro jobs
Serge Blazhievsky 2012-02-08, 23:01
Yes, I have been trying to look into AvroInputFormat
Can you point me to some examples of AvroInputFormat usage? Thnaks
Serge
On Wed, Feb 8, 2012 at 2:53 PM, Scott Carey <[EMAIL PROTECTED]> wrote:
> Unfortunately, I am not familiar with how MultiInput works. You may be > able to compose it with AvroInputFormat within your own InputFormat to get > the required results, but someone with more hadoop input experience would > know more. > > On 2/8/12 2:45 PM, "Serge Blazhievsky" <[EMAIL PROTECTED]> wrote: > > Thanks for replay, Scott!! > > I need to assign different mapper instances for each directory. > > Something similar to MultiInput.addPath > > Any suggestions? > > > Thanks > Serge > > > > On Wed, Feb 8, 2012 at 1:26 PM, Scott Carey <[EMAIL PROTECTED]> wrote: > >> If you are after only multiple paths, path globs work. >> For example to read both /logs/2012_01 and /logs/2012_02 use the glob >> path: >> /logs/2012_0{1,2} >> >> And to read the four paths /logs/2011_01, /logs/2011_02/, logs/2012_01, >> and /logs/2012_02 >> /logs/201{1,2}_0{1,2} >> >> '*' is a wildcard as well, e.g. /logs/2011_*/ >> >> >> If you need a mapper instance per directory or different split >> assignment there would be more work involved. >> >> On 2/8/12 12:24 PM, "Serge Blazhievsky" <[EMAIL PROTECTED]> wrote: >> >> Hi all, >> >> I am trying to assign different mapper to different folders. >> >> Is there an equivalent of Multiinputs for avro >> >> >> MultipleInputs.addInputPath(job, new Path(input), >> AvroInputFormat<GenericRecord>.class, MapImpl.class); >> >> >> Thanks >> Serge >> >> >> >
|
|