Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Trouble with IntersectingIterator


Copy link to this message
-
Re: Trouble with IntersectingIterator
Heath,

In your case, the question that you are effectively asking is "within each
partition, which documents' index entries include all of the given terms".
Since you have partitions aligned by field and only a single index entry
per field you will not get any matches for queries with more than one term.
You can't ask a question that correlates index entries that cross a
partition boundary with the IntersectingIterator. For example, document
"m1" has the index entry for "habelson" in the "sender" partition, but the
index entry for "mgiordano" is in the "receiver" partition.

Another thing you might try is to partition by field within the document
partitions. You can hack this together by building something like the
following, with p1 = {m1,m2,m3} and p2 = {m4,m5}:

p1 receiver_habelson:m3 []    habelson
p1 receiver_jmarcolla:m2 []    jmarcolla
p1 receiver_mgiordano:m1 []    mgiordano
p1 sender_habelson:m1 []    habelson
p1 sender_habelson:m2 []    habelson
p1 sender_mgiordano:m3 []    mgiordano
p1 sentTime_1380571500:m1 []    1380571500
p1 sentTime_1380571502:m2 []    1380571502
p1 sentTime_1380571504:m3 []    1380571504
p1 subject_Lunch:m1 []    Lunch
p1 subject_Lunch:m2 []    Lunch
p1 subject_Lunch:m3 []    Lunch
p2 receiver_habelson:m5 []    habelson
p2 receiver_mcross:m4 []    mcross
p2 sender_habelson:m4 []    habelson
p2 sender_mcross:m5 []    mcross
p2 sentTime_1380571506:m4 []    1380571506
p2 sentTime_1380571508:m5 []    1380571508
p2 subject_Lunch:m4 []    Lunch
p2 subject_Lunch:m5 []    Lunch

Here terms are prefixed by field_, and you can do queries for things like
{"sender_habelson", "receiver_mgiordano"}.

Adam
On Tue, Oct 1, 2013 at 4:13 PM, Heath Abelson <[EMAIL PROTECTED]>wrote:

>  Looking at this example, the index and record do not occur in the same
> row. The seems to be more related to the IndexedDocIterator.****
>
> ** **
>
> If we take my “mail” object as my document, and think of it as being
> partitioned by field name rather than some hash, It seems to me like the
> use of this iterator could still apply.****
>
> ** **
>
> *From:* William Slacum [mailto:[EMAIL PROTECTED]]
> *Sent:* Tuesday, October 01, 2013 3:48 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Trouble with IntersectingIterator****
>
> ** **
>
> That iterator is designed to be used with a sharded table format, where in
> the index and record each occur within the same row. See the Accumulo
> examples page http://accumulo.apache.org/1.4/examples/shard.html****
>
> ** **
>
> On Tue, Oct 1, 2013 at 3:35 PM, Heath Abelson <[EMAIL PROTECTED]>
> wrote:****
>
> I am attempting to get a very simple example working with the Intersecting
> Iterator. I made up some dummy objects for me to do this work:****
>
>  ****
>
> A scan on the “Mail” table looks like this:****
>
>  ****
>
> m1 mail:body [U&(USA)]    WTF?****
>
> m1 mail:receiver [U&(USA)]    mgiordano****
>
> m1 mail:sender [U&(USA)]    habelson****
>
> m1 mail:sentTime [U&(USA)]    1380571500****
>
> m1 mail:subject [U&(USA)]    Lunch****
>
> m2 mail:body [U&(USA)]    I know right?****
>
> m2 mail:receiver [U&(USA)]    jmarcolla****
>
> m2 mail:sender [U&(USA)]    habelson****
>
> m2 mail:sentTime [U&(USA)]    1380571502****
>
> m2 mail:subject [U&(USA)]    Lunch****
>
> m3 mail:body [U&(USA)]    exactly!****
>
> m3 mail:receiver [U&(USA)]    habelson****
>
> m3 mail:sender [U&(USA)]    mgiordano****
>
> m3 mail:sentTime [U&(USA)]    1380571504****
>
> m3 mail:subject [U&(USA)]    Lunch****
>
> m4 mail:body [U&(USA)]    Dude!****
>
> m4 mail:receiver [U&(USA)]    mcross****
>
> m4 mail:sender [U&(USA)]    habelson****
>
> m4 mail:sentTime [U&(USA)]    1380571506****
>
> m4 mail:subject [U&(USA)]    Lunch****
>
> m5 mail:body [U&(USA)]    Yeah****
>
> m5 mail:receiver [U&(USA)]    habelson****
>
> m5 mail:sender [U&(USA)]    mcross****
>
> m5 mail:sentTime [U&(USA)]    1380571508****
>
> m5 mail:subject [U&(USA)]    Lunch****
>
>  ****
>
> A scan on the “MailIndex” table looks like this:****