Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Tune MapReduce over HBase to insert data


Copy link to this message
-
Re: Tune MapReduce over HBase to insert data
Both HFileOutputFormat and LoadIncrementalHFiles are in mapreduce package.

Cheers

On Sun, Jan 13, 2013 at 1:31 AM, Bing Jiang <[EMAIL PROTECTED]>wrote:

> hi,anoop.
> Why not hbase mapreduce package contains the tools like this?
>
> Anoop John <[EMAIL PROTECTED]>编写:
>
> >Hi
> >             Can you think of using HFileOutputFormat ?  Here you use
> >TableOutputFormat now. There will be put calls to HTable. Instead in
> >HFileOutput format the MR will write the HFiles directly.[No flushes ,
> >compactions] Later using LoadIncrementalHFiles need to load the HFiles to
> >the regions.  May help you..
> >
> >-Anoop-
> >
> >On Sun, Jan 13, 2013 at 10:59 AM, Farrokh Shahriari <
> >[EMAIL PROTECTED]> wrote:
> >
> >> Thank you guys,let me change these configuration & test mapreduce again.
> >>
> >> On Tue, Jan 8, 2013 at 10:31 PM, Asaf Mesika <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >> > Start by testing HDFS throughput by doing s simple copyFromLocal using
> >> > Hadoop command line shell (bin/hadoop fs -copyFromLocal pathTo8GBFile
> >> > /tmp/dummyFile1). If you have 1000Mbit/sec network between the
> computers,
> >> > you should get around 75 MB/sec.
> >> >
> >> > On Tuesday, January 8, 2013, Bing Jiang wrote:
> >> >
> >> > > In our experience, it can enhance mapreduce insert by
> >> > > 1.add regionserver flush thread number
> >> > > 2.add memstore/jvm_heap
> >> > > 3.pre split table region before mapreduce
> >> > > 4.add large and small compaction thread number.
> >> > >
> >> > > please correct me if wrong, or any other better ideas.
> >> > > On Jan 8, 2013 4:02 PM, "lars hofhansl" <[EMAIL PROTECTED]
> >> <javascript:;>>
> >> > > wrote:
> >> > >
> >> > > > What type of disks and how many?
> >> > > > With the default replication factor your 2 (or 6) GB are actually
> >> > > > replicated 3 times.
> >> > > > 6GB/80s = 75MB/s, twice that if you do not disable the WAL, which
> a
> >> > > > reasonable machine should be able to absorb.
> >> > > > The fact that deferred log flush does not help you seems to
> indicate
> >> > that
> >> > > > you're over IO bound.
> >> > > >
> >> > > >
> >> > > > What's your memstore flush size? Potentially the data is written
> many
> >> > > > times during compactions.
> >> > > >
> >> > > >
> >> > > > In your case you dial down the HDFS replication, since you only
> have
> >> > two
> >> > > > physical machines anyway.
> >> > > > (Set it to 2. If you do not specify any failure zones, you might
> as
> >> > well
> >> > > > set it to 1... You will lose data if one of your server machines
> dies
> >> > > > anyway).
> >> > > >
> >> > > > It does not really make that much sense to deploy HBase and HDFS
> on
> >> > > > virtual nodes like this.
> >> > > > -- Lars
> >> > > >
> >> > > >
> >> > > >
> >> > > > ________________________________
> >> > > >  From: Farrokh Shahriari <[EMAIL PROTECTED]
> >> <javascript:;>>
> >> > > > To: [EMAIL PROTECTED] <javascript:;>
> >> > > > Sent: Monday, January 7, 2013 9:38 PM
> >> > > > Subject: Re: Tune MapReduce over HBase to insert data
> >> > > >
> >> > > > Hi again,
> >> > > > I'm using HBase 0.92.1-cdh4.0.0.
> >> > > > I have two server machine with 48Gb RAM,12 physical core & 24
> logical
> >> > > core
> >> > > > that contain 12 nodes(6 nodes on each server). Each node has 8Gb
> RAM
> >> &
> >> > 2
> >> > > > VCPU.
> >> > > > I've set some parameter that get better result like set WAL=off on
> >> > > put,but
> >> > > > some parameters like Heap-size,Deferred log flush don't help me.
> >> > > > Beside that I have another question,why each time I've run
> >> > mapreduce,I've
> >> > > > got different result time while all the config & hardware are
> same &
> >> > not
> >> > > > change ?
> >> > > >
> >> > > > Tnx you guys
> >> > > >
> >> > > > On Tue, Jan 8, 2013 at 8:42 AM, Ted Yu <[EMAIL PROTECTED]
> >> > <javascript:;>>
> >> > > wrote:
> >> > > >
> >> > > > > Have you read through
> >> http://hbase.apache.org/book.html#performance?
> >
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB