Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How to query by rowKey-infix


Copy link to this message
-
Re: How to query by rowKey-infix
Hi Christian,

I had the similar requirements as yours. So, till now i have used
timestamps for filtering the data and I would say the performance is
satisfactory. Here are the results of timestamp based filtering:
The table has 34 million records(average row size is 1.21 KB), in 136
seconds i get the entire result of query which had 225 rows.
I am running a HBase 0.92, 8 node cluster on Vmware Hypervisor. Each node
had 3.2 GB of memory, and 500 GB HDFS space. Each Hard Drive in my set-up
is hosting 2 Slaves Instance(2 VM's running Datanode,
NodeManager,RegionServer). I have only allocated 1200MB for RS's. I haven't
done any modification in the block size of HDFS or HBase. Considering the
below-par hardware configuration of cluster i feel the performance is OK
and IMO it'll be better than substring comparator of column values since in
substring comparator filter you are essentially doing a FULL TABLE scan.
Whereas, in timerange based scan you can *Skip Store Files*.

On a side note, Alex created a JIRA for enhancing the current
FuzzyRowFilter to do range based filtering also. Here is the link:
https://issues.apache.org/jira/browse/HBASE-6618 . You are more than
welcome if you would like to chime in.

HTH,
Anil Gupta
On Thu, Aug 9, 2012 at 1:55 PM, Christian Schäfer <[EMAIL PROTECTED]>wrote:

> Nice. Thanks Alex for sharing your experiences with that custom filter
> implementation.
>
>
> Currently I'm still using key filter with substring comparator.
> As soon as I got a good amount of test data I will measure performance of
> that naiive substring filter in comparison to your fuzzy row filter.
>
> regards,
> Christian
>
>
>
> ________________________________
> Von: Alex Baranau <[EMAIL PROTECTED]>
> An: [EMAIL PROTECTED]; Christian Schäfer <[EMAIL PROTECTED]>
> Gesendet: 22:18 Donnerstag, 9.August 2012
> Betreff: Re: How to query by rowKey-infix
>
>
> jfyi: documented FuzzyRowFilter usage here: http://bit.ly/OXVdbg. Will
> add documentation to HBase book very soon [1]
>
> Alex Baranau
> ------
> Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
>
> [1] https://issues.apache.org/jira/browse/HBASE-6526
>
> On Fri, Aug 3, 2012 at 6:14 PM, Alex Baranau <[EMAIL PROTECTED]>
> wrote:
>
> Good!
> >
> >
> >Submitted initial patch of fuzzy row key filter at
> https://issues.apache.org/jira/browse/HBASE-6509. You can just copy the
> filter class and include it in your code and use it in your setup as any
> other custom filter (no need to patch HBase).
> >
> >
> >Please let me know if you try it out (or post your comments at
> HBASE-6509).
> >
> >
> >Alex Baranau
> >------
> >Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
> >
> >
> >On Fri, Aug 3, 2012 at 5:23 AM, Christian Schäfer <[EMAIL PROTECTED]>
> wrote:
> >
> >Hi Alex,
> >>
> >>thanks a lot for the hint about setting the timestamp of the put.
> >>I didn't know that this would be possible but that's solving the problem
> (first test was successful).
> >>So I'm really glad that I don't need to apply a filter to extract the
> time and so on for every row.
> >>
> >>Nevertheless I would like to see your custom filter implementation.
> >>Would be nice if you could provide it helping me to get a bit into it.
> >>
> >>And yes that helped :)
> >>
> >>regards
> >>Chris
> >>
> >>
> >>
> >>________________________________
> >>Von: Alex Baranau <[EMAIL PROTECTED]>
> >>An: [EMAIL PROTECTED]; Christian Schäfer <[EMAIL PROTECTED]>
> >>Gesendet: 0:57 Freitag, 3.August 2012
> >>
> >>Betreff: Re: How to query by rowKey-infix
> >>
> >>
> >>Hi Christian!
> >>If to put off secondary indexes and assume you are going with "heavy
> scans", you can try two following things to make it much faster. If this is
> appropriate to your situation, of course.
> >>
> >>1.
> >>
> >>> Is there a more elegant way to collect rows within time range X?
> >>> (Unfortunately, the date attribute is not equal to the timestamp that
> is stored by hbase automatically.)

Thanks & Regards,
Anil Gupta
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB