Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Tune MapReduce over HBase to insert data


Copy link to this message
-
Re: Tune MapReduce over HBase to insert data
I didn't find documentation about these settings ; is it recommended to set
it greater than the default value ("1") on modern servers ? Or is it an
internal behavior we should not tune by ourselves?
On Tue, Jan 15, 2013 at 2:33 AM, Bing Jiang <[EMAIL PROTECTED]>wrote:

> Hi, mohandes.zebeleh
> you can adjust parameter as below( Major Compaction, Minor Compaction,
> Split):
> if you do not set, it will retain default value(1).
>
> <property>
>   <name>hbase.regionserver.thread.compaction.large</name>
>   <value>5</value>
> </property>
> <property>
>   <name>hbase.regionserver.thread.compaction.small</name>
>   <value>10</value>
> </property>
> <property>
>   <name>hbase.regionserver.thread.split</name>
>   <value>5</value>
> </property>
>
> Regards!
>
> Bing
>
> 2013/1/14 Farrokh Shahriari <[EMAIL PROTECTED]>
>
> > Bing Jiang, What do you mean by add compaction thread number ? Because,
> in
> > Hbase-site.xml we have compactionqueuesize or compactionthreshold but not
> > the parameter that you have said.
> >
> > Thanks you if you guide me.
> >
> > On Sun, Jan 13, 2013 at 7:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > Both HFileOutputFormat and LoadIncrementalHFiles are in mapreduce
> > package.
> > >
> > > Cheers
> > >
> > > On Sun, Jan 13, 2013 at 1:31 AM, Bing Jiang <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > hi,anoop.
> > > > Why not hbase mapreduce package contains the tools like this?
> > > >
> > > > Anoop John <[EMAIL PROTECTED]>编写:
> > > >
> > > > >Hi
> > > > >             Can you think of using HFileOutputFormat ?  Here you
> use
> > > > >TableOutputFormat now. There will be put calls to HTable. Instead in
> > > > >HFileOutput format the MR will write the HFiles directly.[No
> flushes ,
> > > > >compactions] Later using LoadIncrementalHFiles need to load the
> HFiles
> > > to
> > > > >the regions.  May help you..
> > > > >
> > > > >-Anoop-
> > > > >
> > > > >On Sun, Jan 13, 2013 at 10:59 AM, Farrokh Shahriari <
> > > > >[EMAIL PROTECTED]> wrote:
> > > > >
> > > > >> Thank you guys,let me change these configuration & test mapreduce
> > > again.
> > > > >>
> > > > >> On Tue, Jan 8, 2013 at 10:31 PM, Asaf Mesika <
> [EMAIL PROTECTED]
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > Start by testing HDFS throughput by doing s simple copyFromLocal
> > > using
> > > > >> > Hadoop command line shell (bin/hadoop fs -copyFromLocal
> > > pathTo8GBFile
> > > > >> > /tmp/dummyFile1). If you have 1000Mbit/sec network between the
> > > > computers,
> > > > >> > you should get around 75 MB/sec.
> > > > >> >
> > > > >> > On Tuesday, January 8, 2013, Bing Jiang wrote:
> > > > >> >
> > > > >> > > In our experience, it can enhance mapreduce insert by
> > > > >> > > 1.add regionserver flush thread number
> > > > >> > > 2.add memstore/jvm_heap
> > > > >> > > 3.pre split table region before mapreduce
> > > > >> > > 4.add large and small compaction thread number.
> > > > >> > >
> > > > >> > > please correct me if wrong, or any other better ideas.
> > > > >> > > On Jan 8, 2013 4:02 PM, "lars hofhansl" <[EMAIL PROTECTED]
> > > > >> <javascript:;>>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > What type of disks and how many?
> > > > >> > > > With the default replication factor your 2 (or 6) GB are
> > > actually
> > > > >> > > > replicated 3 times.
> > > > >> > > > 6GB/80s = 75MB/s, twice that if you do not disable the WAL,
> > > which
> > > > a
> > > > >> > > > reasonable machine should be able to absorb.
> > > > >> > > > The fact that deferred log flush does not help you seems to
> > > > indicate
> > > > >> > that
> > > > >> > > > you're over IO bound.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > What's your memstore flush size? Potentially the data is
> > written
> > > > many
> > > > >> > > > times during compactions.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > In your case you dial down the HDFS replication, since you
> > only
> > > > have
> > > > >> > two
> > > > >> > > > physical machines anyway.

Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB