Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Tune MapReduce over HBase to insert data


Copy link to this message
-
Re: Tune MapReduce over HBase to insert data
Farrokh Shahriari 2013-01-16, 11:20
I've noticed that if I comment the write command in Map function (
Context.write(row,put)),it will just take 40 sec. The differences is about
30 seconds,that's weird for me,what do you think ?

the parameters that are useful up to now:
hbase.hstore.blockingStoreFiles => 20
hbase.hregion.memstore.block.multiplier => 4
hbase.hregion.memstore.flush.size => 1073741824
speculative.execution => false
wal => false

should I change these two parameter : io.sort.mb & io.sort.factor ?

Mohandes

On Tue, Jan 15, 2013 at 5:03 AM, Bing Jiang <[EMAIL PROTECTED]>wrote:

> Hi, mohandes.zebeleh
> you can adjust parameter as below( Major Compaction, Minor Compaction,
> Split):
> if you do not set, it will retain default value(1).
>
> <property>
>   <name>hbase.regionserver.thread.compaction.large</name>
>   <value>5</value>
> </property>
> <property>
>   <name>hbase.regionserver.thread.compaction.small</name>
>   <value>10</value>
> </property>
> <property>
>   <name>hbase.regionserver.thread.split</name>
>   <value>5</value>
> </property>
>
> Regards!
>
> Bing
>
> 2013/1/14 Farrokh Shahriari <[EMAIL PROTECTED]>
>
>> Bing Jiang, What do you mean by add compaction thread number ? Because, in
>> Hbase-site.xml we have compactionqueuesize or compactionthreshold but not
>> the parameter that you have said.
>>
>> Thanks you if you guide me.
>>
>> On Sun, Jan 13, 2013 at 7:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>
>> > Both HFileOutputFormat and LoadIncrementalHFiles are in mapreduce
>> package.
>> >
>> > Cheers
>> >
>> > On Sun, Jan 13, 2013 at 1:31 AM, Bing Jiang <[EMAIL PROTECTED]
>> > >wrote:
>> >
>> > > hi,anoop.
>> > > Why not hbase mapreduce package contains the tools like this?
>> > >
>> > > Anoop John <[EMAIL PROTECTED]>编写:
>> > >
>> > > >Hi
>> > > >             Can you think of using HFileOutputFormat ?  Here you use
>> > > >TableOutputFormat now. There will be put calls to HTable. Instead in
>> > > >HFileOutput format the MR will write the HFiles directly.[No flushes
>> ,
>> > > >compactions] Later using LoadIncrementalHFiles need to load the
>> HFiles
>> > to
>> > > >the regions.  May help you..
>> > > >
>> > > >-Anoop-
>> > > >
>> > > >On Sun, Jan 13, 2013 at 10:59 AM, Farrokh Shahriari <
>> > > >[EMAIL PROTECTED]> wrote:
>> > > >
>> > > >> Thank you guys,let me change these configuration & test mapreduce
>> > again.
>> > > >>
>> > > >> On Tue, Jan 8, 2013 at 10:31 PM, Asaf Mesika <
>> [EMAIL PROTECTED]>
>> > > >> wrote:
>> > > >>
>> > > >> > Start by testing HDFS throughput by doing s simple copyFromLocal
>> > using
>> > > >> > Hadoop command line shell (bin/hadoop fs -copyFromLocal
>> > pathTo8GBFile
>> > > >> > /tmp/dummyFile1). If you have 1000Mbit/sec network between the
>> > > computers,
>> > > >> > you should get around 75 MB/sec.
>> > > >> >
>> > > >> > On Tuesday, January 8, 2013, Bing Jiang wrote:
>> > > >> >
>> > > >> > > In our experience, it can enhance mapreduce insert by
>> > > >> > > 1.add regionserver flush thread number
>> > > >> > > 2.add memstore/jvm_heap
>> > > >> > > 3.pre split table region before mapreduce
>> > > >> > > 4.add large and small compaction thread number.
>> > > >> > >
>> > > >> > > please correct me if wrong, or any other better ideas.
>> > > >> > > On Jan 8, 2013 4:02 PM, "lars hofhansl" <[EMAIL PROTECTED]
>> > > >> <javascript:;>>
>> > > >> > > wrote:
>> > > >> > >
>> > > >> > > > What type of disks and how many?
>> > > >> > > > With the default replication factor your 2 (or 6) GB are
>> > actually
>> > > >> > > > replicated 3 times.
>> > > >> > > > 6GB/80s = 75MB/s, twice that if you do not disable the WAL,
>> > which
>> > > a
>> > > >> > > > reasonable machine should be able to absorb.
>> > > >> > > > The fact that deferred log flush does not help you seems to
>> > > indicate
>> > > >> > that
>> > > >> > > > you're over IO bound.
>> > > >> > > >
>> > > >> > > >
>> > > >> > > > What's your memstore flush size? Potentially the data is
>> written
>> >