Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Table Row Count Optimization - A Solicitation For Help


Copy link to this message
-
Re: HBase Table Row Count Optimization - A Solicitation For Help
Thanks for the feedback.

I logged HBASE-9605 for relaxation of this requirement for row
count aggregate.
On Fri, Sep 20, 2013 at 8:46 PM, James Birchfield <
[EMAIL PROTECTED]> wrote:

> Thanks.  I have ben taking a look this evening.  We enabled the
> Aggregation coprocessor and the Aggregation client works great.  I still
> have to execute it with the 'hadoop jar' command though, but can live with
> that.  When I try to run it in process, it just hangs.  I am not going to
> fight i though.
>
> The only thing I dislike about the AggrgationClient is that it requires a
> column family.  I was hoping to do this in a completely generic way,
> without having any information about a tables column families to get a row
> count.  The provided implementation requires exactly one.  I was hoping
> maybe there was always some sort of default column family always print on a
> table but it does not appear so.  I will look at the provided coprocessor
> implementation and see why it is required and see if it can be optional,
> and if so, what the performance penalty would be.  In the mean time, I am
> just using the first column family returned from a query to the admin
> client for a table.  Seems to work fine.
>
> Thanks!
> Birch
> On Sep 20, 2013, at 8:41 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > HBase is open source. You can check out the source code and look at the
> > source code.
> >
> > $ svn info
> > Path: .
> > URL: http://svn.apache.org/repos/asf/hbase/branches/0.94
> > Repository Root: http://svn.apache.org/repos/asf
> > Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> > Revision: 1525061
> >
> >
> > On Fri, Sep 20, 2013 at 6:46 PM, James Birchfield <
> > [EMAIL PROTECTED]> wrote:
> >
> >> Ted,
> >>
> >>        My apologies if I am being thick, but I am looking at the API
> docs
> >> here: http://hbase.apache.org/apidocs/index.html and I do not see that
> >> package.  And the coprocessor package only contains an exception.
> >>
> >>        Ok, weird.  Those classes do not show up through normal
> navigation
> >> from that link, however, the documentation does exist if I google for it
> >> directly.  Maybe the javadocs need to be regenerated???  Dunno, but I
> will
> >> check it out.
> >>
> >> Birch
> >>
> >> On Sep 20, 2013, at 6:32 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >>
> >>> Please take a look at the javadoc
> >>> for
> >>
> src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
> >>>
> >>> As long as the machine can reach your HBase cluster, you should be able
> >> to
> >>> run AggregationClient and utilize the AggregateImplementation endpoint
> in
> >>> the region servers.
> >>>
> >>> Cheers
> >>>
> >>>
> >>> On Fri, Sep 20, 2013 at 6:26 PM, James Birchfield <
> >>> [EMAIL PROTECTED]> wrote:
> >>>
> >>>> Thanks Ted.
> >>>>
> >>>> That was the direction I have been working towards as I am learning
> >> today.
> >>>> Much appreciation to all the replies to this thread.
> >>>>
> >>>> Whether I keep the MapReduce job or utilize the Aggregation
> coprocessor
> >>>> (which is turning out that it should be possible for me here), I need
> to
> >>>> make sure I am running the client in an efficient manner.  Lars may
> have
> >>>> hit upon the core problem.  I am not running the map reduce job on the
> >>>> cluster, but rather from a stand alone remote java client executing
> the
> >> job
> >>>> in process.  This may very well turn out to be the number one issue.
>  I
> >>>> would love it if this turns out to be true.  Would make this a great
> >>>> learning lesson for me as a relative newcomer to working with HBase,
> and
> >>>> potentially allow me to finish this initial task much quicker than I
> was
> >>>> thinking.
> >>>>
> >>>> So assuming the MapReduce jobs need to be run on the cluster instead
> of
> >>>> locally, does a coprocessor endpoint client need to be run the same,
> or
> >> is
> >>>> it safe to run it on a remote machine since the work gets distributed