Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> HBase Map/Reduce Data Ingest Performance


+
Upender K. Nimbekar 2012-12-17, 15:34
+
Ted Yu 2012-12-17, 17:45
+
Upender K. Nimbekar 2012-12-17, 19:11
+
Ted Yu 2012-12-18, 00:52
+
Upender K. Nimbekar 2012-12-18, 02:30
+
Ted Yu 2012-12-18, 03:28
+
Nick Dimiduk 2012-12-18, 17:31
+
Upender K. Nimbekar 2012-12-18, 19:06
+
Jean-Daniel Cryans 2012-12-18, 19:17
+
Nick Dimiduk 2012-12-18, 19:20
+
lars hofhansl 2012-12-19, 07:07
Copy link to this message
-
Re: HBase Map/Reduce Data Ingest Performance
Now of course I see that both Nick and J-D already replied saying something similar.
Apologies for repeating.
Anyway, please keep asking questions. That is how we all learn.

________________________________
 From: lars hofhansl <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Tuesday, December 18, 2012 11:07 PM
Subject: Re: HBase Map/Reduce Data Ingest Performance
 
Hi Upender,

I think you misinterpreted what what Nick was saying.
Personally, if I start something with "Dumb question" what I mean is "please forgive me if you had already thought about this, just making sure in case you missed it". I think Nick meant it the same way.
We're pretty friendly folks here (mostly ;-) ).
-- Lars

________________________________
From: Upender K. Nimbekar <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Tuesday, December 18, 2012 11:06 AM
Subject: Re: HBase Map/Reduce Data Ingest Performance

I would like to request you maintain the respect of people asking questions
on this forum. Let's not start the thread in the wrong direction.
I wish it was a dumb question. I did chmod 777 prior to calling bulkLoad.
Call succeeded but bulkLoad call still threw exception. However, it does
work if I do chmod and bulkLoad() from Hadoop Driver after the job is
finished.
BTW, Hbase user needs a WRITE permission and NOT read bease it created some
_tmp directories.

Upen

On Tue, Dec 18, 2012 at 12:31 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:

> Dumb question: what's the filesystem permissions of your generated HFiles?
> Can the HBase process read them? Maybe a simple chmod or chown will get you
> the rest of the way there.
>
> On Mon, Dec 17, 2012 at 6:30 PM, Upender K. Nimbekar <
>  [EMAIL PROTECTED]> wrote:
>
> > Thanks ! I'm calling doBulkLoad() from mapper cleanup() method. But
> running
> > into permission issues while hbase user tries to import Hfile into Hbase.
> > Not sure, if there is way to change the target HDFS file permission via
> > HFileOutputFormat.
> >
> >
> > On Mon, Dec 17, 2012 at 7:52 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > I think second approach is better.
> > >
> > > Cheers
> > >
> > > On Mon, Dec 17, 2012 at 11:11 AM, Upender K. Nimbekar <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Sure. I can try that. Just curious, out of these 2 strategies, which
> > one
> > > do
> > > > you thin is better ? Do you have any experience of trying one or the
> > > other
> > > > ?
> > > >
> > > > Thanks
> > > > Upen
> > > >
> > > > On Mon, Dec 17, 2012 at 12:45 PM, Ted Yu <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > Thanks for sharing your experiences.
> > > > >
> > > > > Have you considered upgrading to HBase 0.92 or 0.94 ?
> > > > > There have been several bug fixes / enhancements
> > > > > to LoadIncrementHFiles.bulkLoad() API in newer HBase releases.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Mon, Dec 17, 2012 at 7:34 AM, Upender K. Nimbekar <
> > > > > [EMAIL PROTECTED]> wrote:
> > > > >
> > > > > > Hi All,
> > > > > > I have question about improving the Map / Reduce job performance
> > > while
> > > > > > ingesting huge amount of data into Hbase using HFileOutputFormat.
> > > Here
> > > > is
> > > > > > what we are using:
> > > > > >
> > > > > > 1) *Cloudera hadoop-0.20.2-cdh3u*
> > > > > > 2) *hbase-0.90.40cdh3u2*
> > > > > >
> > > > > > I've used 2 different strategies as described below:
> > > > > >
> > > > > > *Strategy#1:* PreSplit the number of regions with 10 regions per
> > > region
> > > > > > server. And then subsequently kick off the hadoop job with
> > > > > > HFileOutputFormat.configureIncrementLoad. This mchanism does
> create
> > > > > reduce
> > > > > > tasks equal to the number of regions * 10. We used the "hash" of
> > each
> > > > > > record as the Key to Mapoutput. This process resulted in each
> > mapper
> > > > > finish
> > > > > > process in accepetable amount of time. But the reduce task takes
> > > > forever
> > > > > to
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB