Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Quick Question: LineSplit or BlockSplit


Copy link to this message
-
Re: Quick Question: LineSplit or BlockSplit
That is quite doable.  One way to do it is to make the max split size quite
small.

On Mon, Feb 7, 2011 at 6:14 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote:

> Ted,
>
> I am also interested in this answer.
>
> I put the name of a zip file on a line in an input file, and I want one
> mapper to read this line, and start working on it (since it now knows the
> path in HDFS). Are you saying it's not doable?
>
> Thank you,
> Mark
>
> On Mon, Feb 7, 2011 at 8:10 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
> > Option (1) isn't the way that things normally work.  Besides, mappers are
> > called many times for each construction of a mapper.
> >
> > On Mon, Feb 7, 2011 at 3:38 PM, maha <[EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > >
> > >  I would appreciate it if you could give me your thoughts if there is
> > > affect on efficiency if:
> > >
> > >  1) Mappers were per line in a document
> > >
> > >  or
> > >
> > >  2) Mappers were per block of lines in a document.
> > >
> > >
> > >  I know the obvious difference I can see is that (1) has more mappers.
> > Does
> > > that mean (1) will be slower because of scheduling time ?
> > >
> > > Thank you,
> > > Maha
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB