Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Tune MapReduce over HBase to insert data


Copy link to this message
-
Re: Tune MapReduce over HBase to insert data
Hi, mohandes.zebeleh
you can adjust parameter as below( Major Compaction, Minor Compaction,
Split):
if you do not set, it will retain default value(1).

<property>
  <name>hbase.regionserver.thread.compaction.large</name>
  <value>5</value>
</property>
<property>
  <name>hbase.regionserver.thread.compaction.small</name>
  <value>10</value>
</property>
<property>
  <name>hbase.regionserver.thread.split</name>
  <value>5</value>
</property>

Regards!

Bing

2013/1/14 Farrokh Shahriari <[EMAIL PROTECTED]>

> Bing Jiang, What do you mean by add compaction thread number ? Because, in
> Hbase-site.xml we have compactionqueuesize or compactionthreshold but not
> the parameter that you have said.
>
> Thanks you if you guide me.
>
> On Sun, Jan 13, 2013 at 7:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Both HFileOutputFormat and LoadIncrementalHFiles are in mapreduce
> package.
> >
> > Cheers
> >
> > On Sun, Jan 13, 2013 at 1:31 AM, Bing Jiang <[EMAIL PROTECTED]
> > >wrote:
> >
> > > hi,anoop.
> > > Why not hbase mapreduce package contains the tools like this?
> > >
> > > Anoop John <[EMAIL PROTECTED]>编写:
> > >
> > > >Hi
> > > >             Can you think of using HFileOutputFormat ?  Here you use
> > > >TableOutputFormat now. There will be put calls to HTable. Instead in
> > > >HFileOutput format the MR will write the HFiles directly.[No flushes ,
> > > >compactions] Later using LoadIncrementalHFiles need to load the HFiles
> > to
> > > >the regions.  May help you..
> > > >
> > > >-Anoop-
> > > >
> > > >On Sun, Jan 13, 2013 at 10:59 AM, Farrokh Shahriari <
> > > >[EMAIL PROTECTED]> wrote:
> > > >
> > > >> Thank you guys,let me change these configuration & test mapreduce
> > again.
> > > >>
> > > >> On Tue, Jan 8, 2013 at 10:31 PM, Asaf Mesika <[EMAIL PROTECTED]
> >
> > > >> wrote:
> > > >>
> > > >> > Start by testing HDFS throughput by doing s simple copyFromLocal
> > using
> > > >> > Hadoop command line shell (bin/hadoop fs -copyFromLocal
> > pathTo8GBFile
> > > >> > /tmp/dummyFile1). If you have 1000Mbit/sec network between the
> > > computers,
> > > >> > you should get around 75 MB/sec.
> > > >> >
> > > >> > On Tuesday, January 8, 2013, Bing Jiang wrote:
> > > >> >
> > > >> > > In our experience, it can enhance mapreduce insert by
> > > >> > > 1.add regionserver flush thread number
> > > >> > > 2.add memstore/jvm_heap
> > > >> > > 3.pre split table region before mapreduce
> > > >> > > 4.add large and small compaction thread number.
> > > >> > >
> > > >> > > please correct me if wrong, or any other better ideas.
> > > >> > > On Jan 8, 2013 4:02 PM, "lars hofhansl" <[EMAIL PROTECTED]
> > > >> <javascript:;>>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > What type of disks and how many?
> > > >> > > > With the default replication factor your 2 (or 6) GB are
> > actually
> > > >> > > > replicated 3 times.
> > > >> > > > 6GB/80s = 75MB/s, twice that if you do not disable the WAL,
> > which
> > > a
> > > >> > > > reasonable machine should be able to absorb.
> > > >> > > > The fact that deferred log flush does not help you seems to
> > > indicate
> > > >> > that
> > > >> > > > you're over IO bound.
> > > >> > > >
> > > >> > > >
> > > >> > > > What's your memstore flush size? Potentially the data is
> written
> > > many
> > > >> > > > times during compactions.
> > > >> > > >
> > > >> > > >
> > > >> > > > In your case you dial down the HDFS replication, since you
> only
> > > have
> > > >> > two
> > > >> > > > physical machines anyway.
> > > >> > > > (Set it to 2. If you do not specify any failure zones, you
> might
> > > as
> > > >> > well
> > > >> > > > set it to 1... You will lose data if one of your server
> machines
> > > dies
> > > >> > > > anyway).
> > > >> > > >
> > > >> > > > It does not really make that much sense to deploy HBase and
> HDFS
> > > on
> > > >> > > > virtual nodes like this.
> > > >> > > > -- Lars
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > ________________________________

Bing Jiang
Tel:(86)134-2619-1361
weibo: http://weibo.com/jiangbinglover
BLOG: http://blog.sina.com.cn/jiangbinglover
National Research Center for Intelligent Computing Systems
Institute of Computing technology
Graduate University of Chinese Academy of Science
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB