Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Quick Question: LineSplit or BlockSplit


Copy link to this message
-
Re: Quick Question: LineSplit or BlockSplit
That is quite doable.  One way to do it is to make the max split size quite
small.

On Mon, Feb 7, 2011 at 6:14 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote:

> Ted,
>
> I am also interested in this answer.
>
> I put the name of a zip file on a line in an input file, and I want one
> mapper to read this line, and start working on it (since it now knows the
> path in HDFS). Are you saying it's not doable?
>
> Thank you,
> Mark
>
> On Mon, Feb 7, 2011 at 8:10 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
> > Option (1) isn't the way that things normally work.  Besides, mappers are
> > called many times for each construction of a mapper.
> >
> > On Mon, Feb 7, 2011 at 3:38 PM, maha <[EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > >
> > >  I would appreciate it if you could give me your thoughts if there is
> > > affect on efficiency if:
> > >
> > >  1) Mappers were per line in a document
> > >
> > >  or
> > >
> > >  2) Mappers were per block of lines in a document.
> > >
> > >
> > >  I know the obvious difference I can see is that (1) has more mappers.
> > Does
> > > that mean (1) will be slower because of scheduling time ?
> > >
> > > Thank you,
> > > Maha
> > >
> >
>