Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: Tune MapReduce over HBase to insert data


+
Gerrit Jansen van Vuuren 2013-01-08, 08:56
+
Ted Yu 2013-01-08, 06:36
+
Farrokh Shahriari 2013-01-08, 05:05
+
Ted Yu 2013-01-08, 05:12
+
Farrokh Shahriari 2013-01-08, 05:38
+
lars hofhansl 2013-01-08, 08:02
Copy link to this message
-
Re: Tune MapReduce over HBase to insert data
Bing Jiang 2013-01-08, 15:28
In our experience, it can enhance mapreduce insert by
1.add regionserver flush thread number
2.add memstore/jvm_heap
3.pre split table region before mapreduce
4.add large and small compaction thread number.

please correct me if wrong, or any other better ideas.
On Jan 8, 2013 4:02 PM, "lars hofhansl" <[EMAIL PROTECTED]> wrote:

> What type of disks and how many?
> With the default replication factor your 2 (or 6) GB are actually
> replicated 3 times.
> 6GB/80s = 75MB/s, twice that if you do not disable the WAL, which a
> reasonable machine should be able to absorb.
> The fact that deferred log flush does not help you seems to indicate that
> you're over IO bound.
>
>
> What's your memstore flush size? Potentially the data is written many
> times during compactions.
>
>
> In your case you dial down the HDFS replication, since you only have two
> physical machines anyway.
> (Set it to 2. If you do not specify any failure zones, you might as well
> set it to 1... You will lose data if one of your server machines dies
> anyway).
>
> It does not really make that much sense to deploy HBase and HDFS on
> virtual nodes like this.
> -- Lars
>
>
>
> ________________________________
>  From: Farrokh Shahriari <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Monday, January 7, 2013 9:38 PM
> Subject: Re: Tune MapReduce over HBase to insert data
>
> Hi again,
> I'm using HBase 0.92.1-cdh4.0.0.
> I have two server machine with 48Gb RAM,12 physical core & 24 logical core
> that contain 12 nodes(6 nodes on each server). Each node has 8Gb RAM & 2
> VCPU.
> I've set some parameter that get better result like set WAL=off on put,but
> some parameters like Heap-size,Deferred log flush don't help me.
> Beside that I have another question,why each time I've run mapreduce,I've
> got different result time while all the config & hardware are same & not
> change ?
>
> Tnx you guys
>
> On Tue, Jan 8, 2013 at 8:42 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Have you read through http://hbase.apache.org/book.html#performance ?
> >
> > What version of HBase are you using ?
> >
> > Cheers
> >
> > On Mon, Jan 7, 2013 at 9:05 PM, Farrokh Shahriari <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Hi there
> > > I have a cluster with 12 nodes that each of them has 2 core of CPU.
> Now,I
> > > want insert large data about 2Gb in 80 sec ( or 6Gb in 240sec ). I've
> > used
> > > Map-Reduce over hbase,but I can't achieve proper result .
> > > I'd be glad if you tell me what I can do to get better result or which
> > > parameters should I config or tune to improve Map-Reduce/Hbase
> > performance
> > > ?
> > >
> > > Tnx
> > >
> >
+
Asaf Mesika 2013-01-08, 19:01
+
Farrokh Shahriari 2013-01-13, 05:29
+
Anoop John 2013-01-13, 06:45
+
Ted Yu 2013-01-08, 05:48
+
Bing Jiang 2013-01-13, 09:31
+
Ted Yu 2013-01-13, 15:30
+
Farrokh Shahriari 2013-01-14, 05:58
+
Bing Jiang 2013-01-15, 01:33
+
Farrokh Shahriari 2013-01-16, 11:20
+
Adrien Mogenet 2013-02-04, 19:53