Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - How to quickly count the rows that meet several conditions using hbase coprocessor


Copy link to this message
-
Re: Re: How to quickly count the rows that meet several conditions using hbase coprocessor
lars hofhansl 2014-01-20, 07:41
The real fix is in the parent (HBASE-9428), though.

-- Lars

________________________________
 From: Ted Yu <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Sunday, January 19, 2014 9:22 PM
Subject: Re: Re: How to quickly count the rows that meet several conditions using hbase coprocessor
 

bq. The HBase version 0.94.6-cdh4.3.1

That explains it :-)

See HBASE-9711

Please upgrade your HBase.

Cheers

On Sun, Jan 19, 2014 at 9:08 PM, [EMAIL PROTECTED]
<[EMAIL PROTECTED]>wrote:

>
> It makes no difference even i change it to a single character "A".
>
> Thanks,
> Lei
>
>
>
>
> [EMAIL PROTECTED]
>
> From: Ted Yu
> Date: 2014-01-18 14:28
> To: [EMAIL PROTECTED]
> CC: user; lars hofhansl
> Subject: Re: How to quickly count the rows that meet several conditions
> using hbase coprocessor
> Can you use other string for fake value ?
> DOESNOTEXIST is a bit long. Shouldn't be difficult to come up with a
> single character string that doesn't appear in the first two columns.
>
> Cheers
>
> On Jan 17, 2014,  at 8:34 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> wrote:
>
> > Hi Lars,
> >
> > public class AggregationCountForMultiFilter {
> >
> > private static final byte[] TABLE_NAME = Bytes.toBytes("userdigest");
> > private static final byte[] CF = Bytes.toBytes("cf");
> > private static final byte[] FAKE_VLAUE = Bytes.toBytes("DOESNOTEXIST");
> >
> > public static void main(String[] args) {
> >
> > Configuration conf = new Configuration();
> > Configuration configuration = HBaseConfiguration.create(conf);
> > AggregationClient aggregationClient = new
> AggregationClient(configuration);
> >
> > byte[] colA = Bytes.toBytes("tags");
> > byte[] colB = Bytes.toBytes("googleid");
> > byte[] colC = Bytes.toBytes("createtime");
> >
> > List<Filter> filters = new ArrayList<Filter>();
> >
> > SingleColumnValueFilter filter1 = new SingleColumnValueFilter(CF, colA,
> CompareOp.NOT_EQUAL, FAKE_VLAUE);
> > filter1.setFilterIfMissing(true);
> > filters.add(filter1);
> >
> > SingleColumnValueFilter filter2 = new SingleColumnValueFilter(CF, colB,
> CompareOp.NOT_EQUAL, FAKE_VLAUE);
> > filter2.setFilterIfMissing(true);
> > filters.add(filter2);
> >
> > SingleColumnValueFilter filter3 = new SingleColumnValueFilter(CF, colC,
> CompareOp.EQUAL, new RegexStringComparator("^2014-01-15"));
> > filter3.setFilterIfMissing(true);
> > filters.add(filter3);
> >
> > FilterList filterList = new
> FilterList(FilterList.Operator.MUST_PASS_ALL, filters);
> >
> > Scan scan = new Scan();
> > scan.addFamily(CF);
> > scan.setFilter(filterList);
> >
> > long rowCount = 0;
> > try {
> > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
> > } catch (Throwable e) {
> > e.printStackTrace();
> > }
> > System.out.println("rowCount: " + rowCount);
> > }
> > }
> > }
> >
> > The HBase version 0.94.6-cdh4.3.1
> >
> > Thanks,
> > Lei
> >
> >
> >
> > [EMAIL PROTECTED]
> >
> > From: lars hofhansl
> > Date: 2014-01-18 11:18
> > To: [EMAIL PROTECTED]
> > Subject: Re: Re: How to quickly count the rows that meet several
> conditions using hbase coprocessor
> > Offhand there is no reason for that.
> > If you send some sample code that can seed the data and then run the
> filter that shows the problem, I'll offer to do some profiling.
> >
> > Which version of HBase are you using?
> >
> > -- Lars
> >
> >
> > ________________________________
> > From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > To: user <[EMAIL PROTECTED]>
> > Cc: user <[EMAIL PROTECTED]>
> > Sent: Friday, January 17, 2014 5:24 PM
> > Subject: Re: Re: How to quickly count the rows that meet several
> conditions using hbase coprocessor
> >
> > Hi,
> >
> > I have tried.
> > For a talbe with about 600 million rowkey,  just pass a single
> QualifierFilter,  it can get the result quickly.
> > But when i add the SingleColumnValueFilter with FilterList, it becoumes
> very slow and i can't stand it.
> >
> > I think i can write my own custumed aggregation client.  Is there any