Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - How to query by rowKey-infix


+
Christian Schäfer 2012-07-31, 15:27
+
Jerry Lam 2012-07-31, 17:10
+
Matt Corgan 2012-07-31, 17:41
+
Christian Schäfer 2012-08-01, 08:18
+
Michael Segel 2012-08-01, 11:52
+
Christian Schäfer 2012-08-02, 12:23
+
Michael Segel 2012-08-03, 12:21
+
Christian Schäfer 2012-08-06, 12:54
+
Alex Baranau 2012-08-02, 22:57
+
Matt Corgan 2012-08-02, 23:09
+
Alex Baranau 2012-08-03, 01:15
+
Matt Corgan 2012-08-03, 01:29
+
Christian Schäfer 2012-08-03, 09:34
+
Christian Schäfer 2012-08-03, 09:23
+
Alex Baranau 2012-08-03, 22:14
+
Alex Baranau 2012-08-09, 20:18
+
Christian Schäfer 2012-08-06, 13:00
+
Christian Schäfer 2012-08-09, 20:55
Copy link to this message
-
Re: How to query by rowKey-infix
anil gupta 2012-08-22, 18:42
Hi Christian,

I had the similar requirements as yours. So, till now i have used
timestamps for filtering the data and I would say the performance is
satisfactory. Here are the results of timestamp based filtering:
The table has 34 million records(average row size is 1.21 KB), in 136
seconds i get the entire result of query which had 225 rows.
I am running a HBase 0.92, 8 node cluster on Vmware Hypervisor. Each node
had 3.2 GB of memory, and 500 GB HDFS space. Each Hard Drive in my set-up
is hosting 2 Slaves Instance(2 VM's running Datanode,
NodeManager,RegionServer). I have only allocated 1200MB for RS's. I haven't
done any modification in the block size of HDFS or HBase. Considering the
below-par hardware configuration of cluster i feel the performance is OK
and IMO it'll be better than substring comparator of column values since in
substring comparator filter you are essentially doing a FULL TABLE scan.
Whereas, in timerange based scan you can *Skip Store Files*.

On a side note, Alex created a JIRA for enhancing the current
FuzzyRowFilter to do range based filtering also. Here is the link:
https://issues.apache.org/jira/browse/HBASE-6618 . You are more than
welcome if you would like to chime in.

HTH,
Anil Gupta
On Thu, Aug 9, 2012 at 1:55 PM, Christian Schäfer <[EMAIL PROTECTED]>wrote:

> Nice. Thanks Alex for sharing your experiences with that custom filter
> implementation.
>
>
> Currently I'm still using key filter with substring comparator.
> As soon as I got a good amount of test data I will measure performance of
> that naiive substring filter in comparison to your fuzzy row filter.
>
> regards,
> Christian
>
>
>
> ________________________________
> Von: Alex Baranau <[EMAIL PROTECTED]>
> An: [EMAIL PROTECTED]; Christian Schäfer <[EMAIL PROTECTED]>
> Gesendet: 22:18 Donnerstag, 9.August 2012
> Betreff: Re: How to query by rowKey-infix
>
>
> jfyi: documented FuzzyRowFilter usage here: http://bit.ly/OXVdbg. Will
> add documentation to HBase book very soon [1]
>
> Alex Baranau
> ------
> Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
>
> [1] https://issues.apache.org/jira/browse/HBASE-6526
>
> On Fri, Aug 3, 2012 at 6:14 PM, Alex Baranau <[EMAIL PROTECTED]>
> wrote:
>
> Good!
> >
> >
> >Submitted initial patch of fuzzy row key filter at
> https://issues.apache.org/jira/browse/HBASE-6509. You can just copy the
> filter class and include it in your code and use it in your setup as any
> other custom filter (no need to patch HBase).
> >
> >
> >Please let me know if you try it out (or post your comments at
> HBASE-6509).
> >
> >
> >Alex Baranau
> >------
> >Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
> >
> >
> >On Fri, Aug 3, 2012 at 5:23 AM, Christian Schäfer <[EMAIL PROTECTED]>
> wrote:
> >
> >Hi Alex,
> >>
> >>thanks a lot for the hint about setting the timestamp of the put.
> >>I didn't know that this would be possible but that's solving the problem
> (first test was successful).
> >>So I'm really glad that I don't need to apply a filter to extract the
> time and so on for every row.
> >>
> >>Nevertheless I would like to see your custom filter implementation.
> >>Would be nice if you could provide it helping me to get a bit into it.
> >>
> >>And yes that helped :)
> >>
> >>regards
> >>Chris
> >>
> >>
> >>
> >>________________________________
> >>Von: Alex Baranau <[EMAIL PROTECTED]>
> >>An: [EMAIL PROTECTED]; Christian Schäfer <[EMAIL PROTECTED]>
> >>Gesendet: 0:57 Freitag, 3.August 2012
> >>
> >>Betreff: Re: How to query by rowKey-infix
> >>
> >>
> >>Hi Christian!
> >>If to put off secondary indexes and assume you are going with "heavy
> scans", you can try two following things to make it much faster. If this is
> appropriate to your situation, of course.
> >>
> >>1.
> >>
> >>> Is there a more elegant way to collect rows within time range X?
> >>> (Unfortunately, the date attribute is not equal to the timestamp that
> is stored by hbase automatically.)

Thanks & Regards,
Anil Gupta
+
Christian Schäfer 2012-08-23, 08:41
+
anil gupta 2012-08-24, 07:53