Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Table Row Count Optimization - A Solicitation For Help


Copy link to this message
-
Re: HBase Table Row Count Optimization - A Solicitation For Help
HBase is open source. You can check out the source code and look at the
source code.

$ svn info
Path: .
URL: http://svn.apache.org/repos/asf/hbase/branches/0.94
Repository Root: http://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 1525061
On Fri, Sep 20, 2013 at 6:46 PM, James Birchfield <
[EMAIL PROTECTED]> wrote:

> Ted,
>
>         My apologies if I am being thick, but I am looking at the API docs
> here: http://hbase.apache.org/apidocs/index.html and I do not see that
> package.  And the coprocessor package only contains an exception.
>
>         Ok, weird.  Those classes do not show up through normal navigation
> from that link, however, the documentation does exist if I google for it
> directly.  Maybe the javadocs need to be regenerated???  Dunno, but I will
> check it out.
>
> Birch
>
> On Sep 20, 2013, at 6:32 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Please take a look at the javadoc
> > for
> src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
> >
> > As long as the machine can reach your HBase cluster, you should be able
> to
> > run AggregationClient and utilize the AggregateImplementation endpoint in
> > the region servers.
> >
> > Cheers
> >
> >
> > On Fri, Sep 20, 2013 at 6:26 PM, James Birchfield <
> > [EMAIL PROTECTED]> wrote:
> >
> >> Thanks Ted.
> >>
> >> That was the direction I have been working towards as I am learning
> today.
> >> Much appreciation to all the replies to this thread.
> >>
> >> Whether I keep the MapReduce job or utilize the Aggregation coprocessor
> >> (which is turning out that it should be possible for me here), I need to
> >> make sure I am running the client in an efficient manner.  Lars may have
> >> hit upon the core problem.  I am not running the map reduce job on the
> >> cluster, but rather from a stand alone remote java client executing the
> job
> >> in process.  This may very well turn out to be the number one issue.  I
> >> would love it if this turns out to be true.  Would make this a great
> >> learning lesson for me as a relative newcomer to working with HBase, and
> >> potentially allow me to finish this initial task much quicker than I was
> >> thinking.
> >>
> >> So assuming the MapReduce jobs need to be run on the cluster instead of
> >> locally, does a coprocessor endpoint client need to be run the same, or
> is
> >> it safe to run it on a remote machine since the work gets distributed
> out
> >> to the region servers?  Just wondering if I would run into the same
> issues
> >> if what I said above holds true.
> >>
> >> Thanks!
> >> Birch
> >> On Sep 20, 2013, at 6:17 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >>
> >>> In 0.94, we have AggregateImplementation, an endpoint coprocessor,
> which
> >>> implements getRowNum().
> >>>
> >>> Example is in AggregationClient.java
> >>>
> >>> Cheers
> >>>
> >>>
> >>> On Fri, Sep 20, 2013 at 6:09 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>>> From your numbers below you have about 26k regions, thus each region
> is
> >>>> about 545tb/26k = 20gb. Good.
> >>>>
> >>>> How many mappers are you running?
> >>>> And just to rule out the obvious, the M/R is running on the cluster
> and
> >>>> not locally, right? (it will default to a local runner when it cannot
> >> use
> >>>> the M/R cluster).
> >>>>
> >>>> Some back of the envelope calculations tell me that assuming 1ge
> network
> >>>> cards, the best you can expect for 110 machines to map through this
> >> data is
> >>>> about 10h. (so way faster than what you see).
> >>>> (545tb/(110*1/8gb/s) ~ 40ks ~11h)
> >>>>
> >>>>
> >>>> We should really add a rowcounting coprocessor to HBase and allow
> using
> >> it
> >>>> via M/R.
> >>>>
> >>>> -- Lars
> >>>>
> >>>>
> >>>>
> >>>> ________________________________
> >>>> From: James Birchfield <[EMAIL PROTECTED]>
> >>>> To: [EMAIL PROTECTED]
> >>>> Sent: Friday, September 20, 2013 5:09 PM
> >>>> Subject: Re: HBase Table Row Count Optimization - A Solicitation For
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB