Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Table Row Count Optimization - A Solicitation For Help


Copy link to this message
-
Re: HBase Table Row Count Optimization - A Solicitation For Help
HBase is open source. You can check out the source code and look at the
source code.

$ svn info
Path: .
URL: http://svn.apache.org/repos/asf/hbase/branches/0.94
Repository Root: http://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 1525061
On Fri, Sep 20, 2013 at 6:46 PM, James Birchfield <
[EMAIL PROTECTED]> wrote:

> Ted,
>
>         My apologies if I am being thick, but I am looking at the API docs
> here: http://hbase.apache.org/apidocs/index.html and I do not see that
> package.  And the coprocessor package only contains an exception.
>
>         Ok, weird.  Those classes do not show up through normal navigation
> from that link, however, the documentation does exist if I google for it
> directly.  Maybe the javadocs need to be regenerated???  Dunno, but I will
> check it out.
>
> Birch
>
> On Sep 20, 2013, at 6:32 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Please take a look at the javadoc
> > for
> src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
> >
> > As long as the machine can reach your HBase cluster, you should be able
> to
> > run AggregationClient and utilize the AggregateImplementation endpoint in
> > the region servers.
> >
> > Cheers
> >
> >
> > On Fri, Sep 20, 2013 at 6:26 PM, James Birchfield <
> > [EMAIL PROTECTED]> wrote:
> >
> >> Thanks Ted.
> >>
> >> That was the direction I have been working towards as I am learning
> today.
> >> Much appreciation to all the replies to this thread.
> >>
> >> Whether I keep the MapReduce job or utilize the Aggregation coprocessor
> >> (which is turning out that it should be possible for me here), I need to
> >> make sure I am running the client in an efficient manner.  Lars may have
> >> hit upon the core problem.  I am not running the map reduce job on the
> >> cluster, but rather from a stand alone remote java client executing the
> job
> >> in process.  This may very well turn out to be the number one issue.  I
> >> would love it if this turns out to be true.  Would make this a great
> >> learning lesson for me as a relative newcomer to working with HBase, and
> >> potentially allow me to finish this initial task much quicker than I was
> >> thinking.
> >>
> >> So assuming the MapReduce jobs need to be run on the cluster instead of
> >> locally, does a coprocessor endpoint client need to be run the same, or
> is
> >> it safe to run it on a remote machine since the work gets distributed
> out
> >> to the region servers?  Just wondering if I would run into the same
> issues
> >> if what I said above holds true.
> >>
> >> Thanks!
> >> Birch
> >> On Sep 20, 2013, at 6:17 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >>
> >>> In 0.94, we have AggregateImplementation, an endpoint coprocessor,
> which
> >>> implements getRowNum().
> >>>
> >>> Example is in AggregationClient.java
> >>>
> >>> Cheers
> >>>
> >>>
> >>> On Fri, Sep 20, 2013 at 6:09 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>>> From your numbers below you have about 26k regions, thus each region
> is
> >>>> about 545tb/26k = 20gb. Good.
> >>>>
> >>>> How many mappers are you running?
> >>>> And just to rule out the obvious, the M/R is running on the cluster
> and
> >>>> not locally, right? (it will default to a local runner when it cannot
> >> use
> >>>> the M/R cluster).
> >>>>
> >>>> Some back of the envelope calculations tell me that assuming 1ge
> network
> >>>> cards, the best you can expect for 110 machines to map through this
> >> data is
> >>>> about 10h. (so way faster than what you see).
> >>>> (545tb/(110*1/8gb/s) ~ 40ks ~11h)
> >>>>
> >>>>
> >>>> We should really add a rowcounting coprocessor to HBase and allow
> using
> >> it
> >>>> via M/R.
> >>>>
> >>>> -- Lars
> >>>>
> >>>>
> >>>>
> >>>> ________________________________
> >>>> From: James Birchfield <[EMAIL PROTECTED]>
> >>>> To: [EMAIL PROTECTED]
> >>>> Sent: Friday, September 20, 2013 5:09 PM
> >>>> Subject: Re: HBase Table Row Count Optimization - A Solicitation For