Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: Tune MapReduce over HBase to insert data


+
Gerrit Jansen van Vuuren 2013-01-08, 08:56
+
Ted Yu 2013-01-08, 06:36
+
Farrokh Shahriari 2013-01-08, 05:05
+
Ted Yu 2013-01-08, 05:12
+
Farrokh Shahriari 2013-01-08, 05:38
+
lars hofhansl 2013-01-08, 08:02
+
Bing Jiang 2013-01-08, 15:28
+
Asaf Mesika 2013-01-08, 19:01
+
Farrokh Shahriari 2013-01-13, 05:29
+
Anoop John 2013-01-13, 06:45
+
Ted Yu 2013-01-08, 05:48
+
Bing Jiang 2013-01-13, 09:31
+
Ted Yu 2013-01-13, 15:30
+
Farrokh Shahriari 2013-01-14, 05:58
+
Bing Jiang 2013-01-15, 01:33
+
Farrokh Shahriari 2013-01-16, 11:20
Copy link to this message
-
Re: Tune MapReduce over HBase to insert data
Adrien Mogenet 2013-02-04, 19:53
I didn't find documentation about these settings ; is it recommended to set
it greater than the default value ("1") on modern servers ? Or is it an
internal behavior we should not tune by ourselves?
On Tue, Jan 15, 2013 at 2:33 AM, Bing Jiang <[EMAIL PROTECTED]>wrote:

> Hi, mohandes.zebeleh
> you can adjust parameter as below( Major Compaction, Minor Compaction,
> Split):
> if you do not set, it will retain default value(1).
>
> <property>
>   <name>hbase.regionserver.thread.compaction.large</name>
>   <value>5</value>
> </property>
> <property>
>   <name>hbase.regionserver.thread.compaction.small</name>
>   <value>10</value>
> </property>
> <property>
>   <name>hbase.regionserver.thread.split</name>
>   <value>5</value>
> </property>
>
> Regards!
>
> Bing
>
> 2013/1/14 Farrokh Shahriari <[EMAIL PROTECTED]>
>
> > Bing Jiang, What do you mean by add compaction thread number ? Because,
> in
> > Hbase-site.xml we have compactionqueuesize or compactionthreshold but not
> > the parameter that you have said.
> >
> > Thanks you if you guide me.
> >
> > On Sun, Jan 13, 2013 at 7:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > Both HFileOutputFormat and LoadIncrementalHFiles are in mapreduce
> > package.
> > >
> > > Cheers
> > >
> > > On Sun, Jan 13, 2013 at 1:31 AM, Bing Jiang <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > hi,anoop.
> > > > Why not hbase mapreduce package contains the tools like this?
> > > >
> > > > Anoop John <[EMAIL PROTECTED]>编写:
> > > >
> > > > >Hi
> > > > >             Can you think of using HFileOutputFormat ?  Here you
> use
> > > > >TableOutputFormat now. There will be put calls to HTable. Instead in
> > > > >HFileOutput format the MR will write the HFiles directly.[No
> flushes ,
> > > > >compactions] Later using LoadIncrementalHFiles need to load the
> HFiles
> > > to
> > > > >the regions.  May help you..
> > > > >
> > > > >-Anoop-
> > > > >
> > > > >On Sun, Jan 13, 2013 at 10:59 AM, Farrokh Shahriari <
> > > > >[EMAIL PROTECTED]> wrote:
> > > > >
> > > > >> Thank you guys,let me change these configuration & test mapreduce
> > > again.
> > > > >>
> > > > >> On Tue, Jan 8, 2013 at 10:31 PM, Asaf Mesika <
> [EMAIL PROTECTED]
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > Start by testing HDFS throughput by doing s simple copyFromLocal
> > > using
> > > > >> > Hadoop command line shell (bin/hadoop fs -copyFromLocal
> > > pathTo8GBFile
> > > > >> > /tmp/dummyFile1). If you have 1000Mbit/sec network between the
> > > > computers,
> > > > >> > you should get around 75 MB/sec.
> > > > >> >
> > > > >> > On Tuesday, January 8, 2013, Bing Jiang wrote:
> > > > >> >
> > > > >> > > In our experience, it can enhance mapreduce insert by
> > > > >> > > 1.add regionserver flush thread number
> > > > >> > > 2.add memstore/jvm_heap
> > > > >> > > 3.pre split table region before mapreduce
> > > > >> > > 4.add large and small compaction thread number.
> > > > >> > >
> > > > >> > > please correct me if wrong, or any other better ideas.
> > > > >> > > On Jan 8, 2013 4:02 PM, "lars hofhansl" <[EMAIL PROTECTED]
> > > > >> <javascript:;>>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > What type of disks and how many?
> > > > >> > > > With the default replication factor your 2 (or 6) GB are
> > > actually
> > > > >> > > > replicated 3 times.
> > > > >> > > > 6GB/80s = 75MB/s, twice that if you do not disable the WAL,
> > > which
> > > > a
> > > > >> > > > reasonable machine should be able to absorb.
> > > > >> > > > The fact that deferred log flush does not help you seems to
> > > > indicate
> > > > >> > that
> > > > >> > > > you're over IO bound.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > What's your memstore flush size? Potentially the data is
> > written
> > > > many
> > > > >> > > > times during compactions.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > In your case you dial down the HDFS replication, since you
> > only
> > > > have
> > > > >> > two
> > > > >> > > > physical machines anyway.

Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me