|
|
-
Quick Question: LineSplit or BlockSplit
maha 2011-02-07, 23:38
Hi,
I would appreciate it if you could give me your thoughts if there is affect on efficiency if:
1) Mappers were per line in a document or
2) Mappers were per block of lines in a document. I know the obvious difference I can see is that (1) has more mappers. Does that mean (1) will be slower because of scheduling time ?
Thank you, Maha
-
Re: Quick Question: LineSplit or BlockSplit
Ted Dunning 2011-02-08, 02:10
Option (1) isn't the way that things normally work. Besides, mappers are called many times for each construction of a mapper.
On Mon, Feb 7, 2011 at 3:38 PM, maha <[EMAIL PROTECTED]> wrote:
> Hi, > > I would appreciate it if you could give me your thoughts if there is > affect on efficiency if: > > 1) Mappers were per line in a document > > or > > 2) Mappers were per block of lines in a document. > > > I know the obvious difference I can see is that (1) has more mappers. Does > that mean (1) will be slower because of scheduling time ? > > Thank you, > Maha >
-
Re: Quick Question: LineSplit or BlockSplit
Mark Kerzner 2011-02-08, 02:14
Ted,
I am also interested in this answer.
I put the name of a zip file on a line in an input file, and I want one mapper to read this line, and start working on it (since it now knows the path in HDFS). Are you saying it's not doable?
Thank you, Mark
On Mon, Feb 7, 2011 at 8:10 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> Option (1) isn't the way that things normally work. Besides, mappers are > called many times for each construction of a mapper. > > On Mon, Feb 7, 2011 at 3:38 PM, maha <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I would appreciate it if you could give me your thoughts if there is > > affect on efficiency if: > > > > 1) Mappers were per line in a document > > > > or > > > > 2) Mappers were per block of lines in a document. > > > > > > I know the obvious difference I can see is that (1) has more mappers. > Does > > that mean (1) will be slower because of scheduling time ? > > > > Thank you, > > Maha > > >
-
Re: Quick Question: LineSplit or BlockSplit
Ted Dunning 2011-02-08, 02:28
That is quite doable. One way to do it is to make the max split size quite small.
On Mon, Feb 7, 2011 at 6:14 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote:
> Ted, > > I am also interested in this answer. > > I put the name of a zip file on a line in an input file, and I want one > mapper to read this line, and start working on it (since it now knows the > path in HDFS). Are you saying it's not doable? > > Thank you, > Mark > > On Mon, Feb 7, 2011 at 8:10 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > > Option (1) isn't the way that things normally work. Besides, mappers are > > called many times for each construction of a mapper. > > > > On Mon, Feb 7, 2011 at 3:38 PM, maha <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > I would appreciate it if you could give me your thoughts if there is > > > affect on efficiency if: > > > > > > 1) Mappers were per line in a document > > > > > > or > > > > > > 2) Mappers were per block of lines in a document. > > > > > > > > > I know the obvious difference I can see is that (1) has more mappers. > > Does > > > that mean (1) will be slower because of scheduling time ? > > > > > > Thank you, > > > Maha > > > > > >
-
Re: Quick Question: LineSplit or BlockSplit
Mark Kerzner 2011-02-08, 02:32
Thanks! Mark
On Mon, Feb 7, 2011 at 8:28 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> That is quite doable. One way to do it is to make the max split size quite > small. > > On Mon, Feb 7, 2011 at 6:14 PM, Mark Kerzner <[EMAIL PROTECTED]> > wrote: > > > Ted, > > > > I am also interested in this answer. > > > > I put the name of a zip file on a line in an input file, and I want one > > mapper to read this line, and start working on it (since it now knows the > > path in HDFS). Are you saying it's not doable? > > > > Thank you, > > Mark > > > > On Mon, Feb 7, 2011 at 8:10 PM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > > > > Option (1) isn't the way that things normally work. Besides, mappers > are > > > called many times for each construction of a mapper. > > > > > > On Mon, Feb 7, 2011 at 3:38 PM, maha <[EMAIL PROTECTED]> wrote: > > > > > > > Hi, > > > > > > > > I would appreciate it if you could give me your thoughts if there is > > > > affect on efficiency if: > > > > > > > > 1) Mappers were per line in a document > > > > > > > > or > > > > > > > > 2) Mappers were per block of lines in a document. > > > > > > > > > > > > I know the obvious difference I can see is that (1) has more > mappers. > > > Does > > > > that mean (1) will be slower because of scheduling time ? > > > > > > > > Thank you, > > > > Maha > > > > > > > > > >
-
Re: Quick Question: LineSplit or BlockSplit
maha 2011-02-08, 05:20
Thanks Ted. Then I have to write my own InputFormat to read a block-of-lines per mapper. NLineInputFormat didn't work with me, any working example about it is appreciate it.
Thanks again,
Maha
On Feb 7, 2011, at 6:32 PM, Mark Kerzner wrote:
> Thanks! > Mark > > On Mon, Feb 7, 2011 at 8:28 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > >> That is quite doable. One way to do it is to make the max split size quite >> small. >> >> On Mon, Feb 7, 2011 at 6:14 PM, Mark Kerzner <[EMAIL PROTECTED]> >> wrote: >> >>> Ted, >>> >>> I am also interested in this answer. >>> >>> I put the name of a zip file on a line in an input file, and I want one >>> mapper to read this line, and start working on it (since it now knows the >>> path in HDFS). Are you saying it's not doable? >>> >>> Thank you, >>> Mark >>> >>> On Mon, Feb 7, 2011 at 8:10 PM, Ted Dunning <[EMAIL PROTECTED]> >> wrote: >>> >>>> Option (1) isn't the way that things normally work. Besides, mappers >> are >>>> called many times for each construction of a mapper. >>>> >>>> On Mon, Feb 7, 2011 at 3:38 PM, maha <[EMAIL PROTECTED]> wrote: >>>> >>>>> Hi, >>>>> >>>>> I would appreciate it if you could give me your thoughts if there is >>>>> affect on efficiency if: >>>>> >>>>> 1) Mappers were per line in a document >>>>> >>>>> or >>>>> >>>>> 2) Mappers were per block of lines in a document. >>>>> >>>>> >>>>> I know the obvious difference I can see is that (1) has more >> mappers. >>>> Does >>>>> that mean (1) will be slower because of scheduling time ? >>>>> >>>>> Thank you, >>>>> Maha >>>>> >>>> >>> >>
|
|