Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Tune MapReduce over HBase to insert data


+
Gerrit Jansen van Vuuren 2013-01-08, 08:56
+
Ted Yu 2013-01-08, 06:36
+
Farrokh Shahriari 2013-01-08, 05:05
+
Ted Yu 2013-01-08, 05:12
+
Farrokh Shahriari 2013-01-08, 05:38
+
lars hofhansl 2013-01-08, 08:02
+
Bing Jiang 2013-01-08, 15:28
+
Asaf Mesika 2013-01-08, 19:01
Copy link to this message
-
Re: Tune MapReduce over HBase to insert data
Thank you guys,let me change these configuration & test mapreduce again.

On Tue, Jan 8, 2013 at 10:31 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

> Start by testing HDFS throughput by doing s simple copyFromLocal using
> Hadoop command line shell (bin/hadoop fs -copyFromLocal pathTo8GBFile
> /tmp/dummyFile1). If you have 1000Mbit/sec network between the computers,
> you should get around 75 MB/sec.
>
> On Tuesday, January 8, 2013, Bing Jiang wrote:
>
> > In our experience, it can enhance mapreduce insert by
> > 1.add regionserver flush thread number
> > 2.add memstore/jvm_heap
> > 3.pre split table region before mapreduce
> > 4.add large and small compaction thread number.
> >
> > please correct me if wrong, or any other better ideas.
> > On Jan 8, 2013 4:02 PM, "lars hofhansl" <[EMAIL PROTECTED]<javascript:;>>
> > wrote:
> >
> > > What type of disks and how many?
> > > With the default replication factor your 2 (or 6) GB are actually
> > > replicated 3 times.
> > > 6GB/80s = 75MB/s, twice that if you do not disable the WAL, which a
> > > reasonable machine should be able to absorb.
> > > The fact that deferred log flush does not help you seems to indicate
> that
> > > you're over IO bound.
> > >
> > >
> > > What's your memstore flush size? Potentially the data is written many
> > > times during compactions.
> > >
> > >
> > > In your case you dial down the HDFS replication, since you only have
> two
> > > physical machines anyway.
> > > (Set it to 2. If you do not specify any failure zones, you might as
> well
> > > set it to 1... You will lose data if one of your server machines dies
> > > anyway).
> > >
> > > It does not really make that much sense to deploy HBase and HDFS on
> > > virtual nodes like this.
> > > -- Lars
> > >
> > >
> > >
> > > ________________________________
> > >  From: Farrokh Shahriari <[EMAIL PROTECTED]<javascript:;>>
> > > To: [EMAIL PROTECTED] <javascript:;>
> > > Sent: Monday, January 7, 2013 9:38 PM
> > > Subject: Re: Tune MapReduce over HBase to insert data
> > >
> > > Hi again,
> > > I'm using HBase 0.92.1-cdh4.0.0.
> > > I have two server machine with 48Gb RAM,12 physical core & 24 logical
> > core
> > > that contain 12 nodes(6 nodes on each server). Each node has 8Gb RAM &
> 2
> > > VCPU.
> > > I've set some parameter that get better result like set WAL=off on
> > put,but
> > > some parameters like Heap-size,Deferred log flush don't help me.
> > > Beside that I have another question,why each time I've run
> mapreduce,I've
> > > got different result time while all the config & hardware are same &
> not
> > > change ?
> > >
> > > Tnx you guys
> > >
> > > On Tue, Jan 8, 2013 at 8:42 AM, Ted Yu <[EMAIL PROTECTED]
> <javascript:;>>
> > wrote:
> > >
> > > > Have you read through http://hbase.apache.org/book.html#performance?
> > > >
> > > > What version of HBase are you using ?
> > > >
> > > > Cheers
> > > >
> > > > On Mon, Jan 7, 2013 at 9:05 PM, Farrokh Shahriari <
> > > > [EMAIL PROTECTED] <javascript:;>> wrote:
> > > >
> > > > > Hi there
> > > > > I have a cluster with 12 nodes that each of them has 2 core of CPU.
> > > Now,I
> > > > > want insert large data about 2Gb in 80 sec ( or 6Gb in 240sec ).
> I've
> > > > used
> > > > > Map-Reduce over hbase,but I can't achieve proper result .
> > > > > I'd be glad if you tell me what I can do to get better result or
> > which
> > > > > parameters should I config or tune to improve Map-Reduce/Hbase
> > > > performance
> > > > > ?
> > > > >
> > > > > Tnx
> > > > >
> > > >
> >
>
+
Anoop John 2013-01-13, 06:45
+
Ted Yu 2013-01-08, 05:48
+
Bing Jiang 2013-01-13, 09:31
+
Ted Yu 2013-01-13, 15:30
+
Farrokh Shahriari 2013-01-14, 05:58
+
Bing Jiang 2013-01-15, 01:33
+
Farrokh Shahriari 2013-01-16, 11:20
+
Adrien Mogenet 2013-02-04, 19:53
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB