Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - question about merge-join (or AND operator betwween colums)


Copy link to this message
-
Re: question about merge-join (or AND operator betwween colums)
Andrey Stepachev 2011-01-08, 21:57
Hm. But what the problem to have Long.MAX - dayNum instead of dayNum?
In this case you get all data sorted in reverse order and you give last
entries
first in scan results?

2011/1/8 Jack Levin <[EMAIL PROTECTED]>

> Basic problem described:
>
> user uploads 1 image and creates some text -10 days ago, then creates 1000
> text messages on between 9 days ago and today:
>
>
> row key          | fm:type --> value
>
>
> 00days:uid     | type:text --> text_id
>
> .
>
> .
>
> 09days:uid | type:text --> text_id
>
>
> 10days:uid     | type:photo --> URL
>
>          | type:text --> text_id
>
>
> Skip all the way to 10days:uid row, without reading 00days:id - 09:uid
> rows.
>  Ideally we do not want to read all 1000 entries that have _only_ text.  We
> want to get to last entry in the most efficient way possible.
>
>
> -Jack
>
>
>
>
> On Sat, Jan 8, 2011 at 11:43 AM, Stack <[EMAIL PROTECTED]> wrote:
> > Strike that.  This is a Scan, so can't do blooms + filter.  Sorry.
> > Sounds like a coprocessor then.  You'd have your query 'lean' on the
> > column that you know has the lesser items and then per item, you'd do
> > a get inside the coprocessor against the column of many entries.  The
> > get would go via blooms.
> >
> > St.Ack
> >
> >
> > On Sat, Jan 8, 2011 at 11:39 AM, Stack <[EMAIL PROTECTED]> wrote:
> >> On Sat, Jan 8, 2011 at 11:35 AM, Jack Levin <[EMAIL PROTECTED]> wrote:
> >>> Yes, we thought about using filters, the issue is, if one family
> >>> column has 1ml values, and second family column has 10 values at the
> >>> bottom, we would end up scanning and filtering 99990 records and
> >>> throwing them away, which seems inefficient.
> >>
> >> Blooms+filters?
> >> St.Ack
> >>
> >
>