Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Tune MapReduce over HBase to insert data


Copy link to this message
-
Re: Tune MapReduce over HBase to insert data
Asaf Mesika 2013-01-08, 19:01
Start by testing HDFS throughput by doing s simple copyFromLocal using
Hadoop command line shell (bin/hadoop fs -copyFromLocal pathTo8GBFile
/tmp/dummyFile1). If you have 1000Mbit/sec network between the computers,
you should get around 75 MB/sec.

On Tuesday, January 8, 2013, Bing Jiang wrote:

> In our experience, it can enhance mapreduce insert by
> 1.add regionserver flush thread number
> 2.add memstore/jvm_heap
> 3.pre split table region before mapreduce
> 4.add large and small compaction thread number.
>
> please correct me if wrong, or any other better ideas.
> On Jan 8, 2013 4:02 PM, "lars hofhansl" <[EMAIL PROTECTED] <javascript:;>>
> wrote:
>
> > What type of disks and how many?
> > With the default replication factor your 2 (or 6) GB are actually
> > replicated 3 times.
> > 6GB/80s = 75MB/s, twice that if you do not disable the WAL, which a
> > reasonable machine should be able to absorb.
> > The fact that deferred log flush does not help you seems to indicate that
> > you're over IO bound.
> >
> >
> > What's your memstore flush size? Potentially the data is written many
> > times during compactions.
> >
> >
> > In your case you dial down the HDFS replication, since you only have two
> > physical machines anyway.
> > (Set it to 2. If you do not specify any failure zones, you might as well
> > set it to 1... You will lose data if one of your server machines dies
> > anyway).
> >
> > It does not really make that much sense to deploy HBase and HDFS on
> > virtual nodes like this.
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Farrokh Shahriari <[EMAIL PROTECTED] <javascript:;>>
> > To: [EMAIL PROTECTED] <javascript:;>
> > Sent: Monday, January 7, 2013 9:38 PM
> > Subject: Re: Tune MapReduce over HBase to insert data
> >
> > Hi again,
> > I'm using HBase 0.92.1-cdh4.0.0.
> > I have two server machine with 48Gb RAM,12 physical core & 24 logical
> core
> > that contain 12 nodes(6 nodes on each server). Each node has 8Gb RAM & 2
> > VCPU.
> > I've set some parameter that get better result like set WAL=off on
> put,but
> > some parameters like Heap-size,Deferred log flush don't help me.
> > Beside that I have another question,why each time I've run mapreduce,I've
> > got different result time while all the config & hardware are same & not
> > change ?
> >
> > Tnx you guys
> >
> > On Tue, Jan 8, 2013 at 8:42 AM, Ted Yu <[EMAIL PROTECTED]<javascript:;>>
> wrote:
> >
> > > Have you read through http://hbase.apache.org/book.html#performance ?
> > >
> > > What version of HBase are you using ?
> > >
> > > Cheers
> > >
> > > On Mon, Jan 7, 2013 at 9:05 PM, Farrokh Shahriari <
> > > [EMAIL PROTECTED] <javascript:;>> wrote:
> > >
> > > > Hi there
> > > > I have a cluster with 12 nodes that each of them has 2 core of CPU.
> > Now,I
> > > > want insert large data about 2Gb in 80 sec ( or 6Gb in 240sec ). I've
> > > used
> > > > Map-Reduce over hbase,but I can't achieve proper result .
> > > > I'd be glad if you tell me what I can do to get better result or
> which
> > > > parameters should I config or tune to improve Map-Reduce/Hbase
> > > performance
> > > > ?
> > > >
> > > > Tnx
> > > >
> > >
>