Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Hbase import Tsv performance (slow import)


Copy link to this message
-
Re: Hbase import Tsv performance (slow import)
anil gupta 2012-10-24, 16:30
Hi Nick,

How many hard drives your slaves has? RPM of those? How many mappers are
run concurrently on a node?Did you turn off speculative execution? Have a
look at disk i/o to see whether that is a bottleneck or not.

MR is disk I/O bound so if you only have one disk on slave and you are
running 5 Mapper concurrently then the job will slow down.

Thanks,
Anil

On Wed, Oct 24, 2012 at 9:18 AM, Kevin O'dell <[EMAIL PROTECTED]>wrote:

> Nick,
>
>   What versions are you using:
>
> HDFS
> HBase
> OS
>  On Oct 24, 2012 10:36 AM, "Nick maillard" <
> [EMAIL PROTECTED]>
> wrote:
>
> > Hello everyone
> >
> > Still looking in the issue.
> > I have tried different tests and the results are surprising.
> > If I put mapred.tasktracker.map.tasks.maximum: 28
> > I get a total of 84 tasks on my cluster and the process takes about 1h15
> > min
> > each task taking up 1h10 minutes. The whole file being cut down in 80
> > tasks.
> >
> > If I put  mapred.tasktracker.map.tasks.maximum: 3
> > I get a total of 6 tasks on my cluster and the process takes about the
> same
> > amount of time 1h20 still cutting down the whole file in 80 tasks, but
> now
> > of
> > course each individual task only takes up a couple of minutes.
> >
> > It's like the overall importTSv must take 1h something and the duration
> of
> > the
> > map tasks vary accordingly.
> >
> > There is definitly something I am doing wrong.
> >
> >
> >
> >
>

--
Thanks & Regards,
Anil Gupta