Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Table Row Count Optimization - A Solicitation For Help


Copy link to this message
-
Re: HBase Table Row Count Optimization - A Solicitation For Help
Thanks for the feedback.

I logged HBASE-9605 for relaxation of this requirement for row
count aggregate.
On Fri, Sep 20, 2013 at 8:46 PM, James Birchfield <
[EMAIL PROTECTED]> wrote:

> Thanks.  I have ben taking a look this evening.  We enabled the
> Aggregation coprocessor and the Aggregation client works great.  I still
> have to execute it with the 'hadoop jar' command though, but can live with
> that.  When I try to run it in process, it just hangs.  I am not going to
> fight i though.
>
> The only thing I dislike about the AggrgationClient is that it requires a
> column family.  I was hoping to do this in a completely generic way,
> without having any information about a tables column families to get a row
> count.  The provided implementation requires exactly one.  I was hoping
> maybe there was always some sort of default column family always print on a
> table but it does not appear so.  I will look at the provided coprocessor
> implementation and see why it is required and see if it can be optional,
> and if so, what the performance penalty would be.  In the mean time, I am
> just using the first column family returned from a query to the admin
> client for a table.  Seems to work fine.
>
> Thanks!
> Birch
> On Sep 20, 2013, at 8:41 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > HBase is open source. You can check out the source code and look at the
> > source code.
> >
> > $ svn info
> > Path: .
> > URL: http://svn.apache.org/repos/asf/hbase/branches/0.94
> > Repository Root: http://svn.apache.org/repos/asf
> > Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> > Revision: 1525061
> >
> >
> > On Fri, Sep 20, 2013 at 6:46 PM, James Birchfield <
> > [EMAIL PROTECTED]> wrote:
> >
> >> Ted,
> >>
> >>        My apologies if I am being thick, but I am looking at the API
> docs
> >> here: http://hbase.apache.org/apidocs/index.html and I do not see that
> >> package.  And the coprocessor package only contains an exception.
> >>
> >>        Ok, weird.  Those classes do not show up through normal
> navigation
> >> from that link, however, the documentation does exist if I google for it
> >> directly.  Maybe the javadocs need to be regenerated???  Dunno, but I
> will
> >> check it out.
> >>
> >> Birch
> >>
> >> On Sep 20, 2013, at 6:32 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >>
> >>> Please take a look at the javadoc
> >>> for
> >>
> src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
> >>>
> >>> As long as the machine can reach your HBase cluster, you should be able
> >> to
> >>> run AggregationClient and utilize the AggregateImplementation endpoint
> in
> >>> the region servers.
> >>>
> >>> Cheers
> >>>
> >>>
> >>> On Fri, Sep 20, 2013 at 6:26 PM, James Birchfield <
> >>> [EMAIL PROTECTED]> wrote:
> >>>
> >>>> Thanks Ted.
> >>>>
> >>>> That was the direction I have been working towards as I am learning
> >> today.
> >>>> Much appreciation to all the replies to this thread.
> >>>>
> >>>> Whether I keep the MapReduce job or utilize the Aggregation
> coprocessor
> >>>> (which is turning out that it should be possible for me here), I need
> to
> >>>> make sure I am running the client in an efficient manner.  Lars may
> have
> >>>> hit upon the core problem.  I am not running the map reduce job on the
> >>>> cluster, but rather from a stand alone remote java client executing
> the
> >> job
> >>>> in process.  This may very well turn out to be the number one issue.
>  I
> >>>> would love it if this turns out to be true.  Would make this a great
> >>>> learning lesson for me as a relative newcomer to working with HBase,
> and
> >>>> potentially allow me to finish this initial task much quicker than I
> was
> >>>> thinking.
> >>>>
> >>>> So assuming the MapReduce jobs need to be run on the cluster instead
> of
> >>>> locally, does a coprocessor endpoint client need to be run the same,
> or
> >> is
> >>>> it safe to run it on a remote machine since the work gets distributed
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB