Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase import Tsv performance (slow import)


Copy link to this message
-
Re: Hbase import Tsv performance (slow import)
Hi Nick,

How many hard drives your slaves has? RPM of those? How many mappers are
run concurrently on a node?Did you turn off speculative execution? Have a
look at disk i/o to see whether that is a bottleneck or not.

MR is disk I/O bound so if you only have one disk on slave and you are
running 5 Mapper concurrently then the job will slow down.

Thanks,
Anil

On Wed, Oct 24, 2012 at 9:18 AM, Kevin O'dell <[EMAIL PROTECTED]>wrote:

> Nick,
>
>   What versions are you using:
>
> HDFS
> HBase
> OS
>  On Oct 24, 2012 10:36 AM, "Nick maillard" <
> [EMAIL PROTECTED]>
> wrote:
>
> > Hello everyone
> >
> > Still looking in the issue.
> > I have tried different tests and the results are surprising.
> > If I put mapred.tasktracker.map.tasks.maximum: 28
> > I get a total of 84 tasks on my cluster and the process takes about 1h15
> > min
> > each task taking up 1h10 minutes. The whole file being cut down in 80
> > tasks.
> >
> > If I put  mapred.tasktracker.map.tasks.maximum: 3
> > I get a total of 6 tasks on my cluster and the process takes about the
> same
> > amount of time 1h20 still cutting down the whole file in 80 tasks, but
> now
> > of
> > course each individual task only takes up a couple of minutes.
> >
> > It's like the overall importTSv must take 1h something and the duration
> of
> > the
> > map tasks vary accordingly.
> >
> > There is definitly something I am doing wrong.
> >
> >
> >
> >
>

--
Thanks & Regards,
Anil Gupta
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB