

RE: Manually build tuple from three group relationswilliam.dowling@... 20110707, 13:41
You could use two rounds of the outer join/filter by null idiom. For example after the first round you would get allTermsMinusNonNumbers like this:
grunt> sh cat allTerms aa bb cc 11 22 33 grunt> sh cat nonNumbers cc grunt> allTerms = load 'allTerms' as (term:chararray); grunt> nonNumbers = load 'nonNumbers' as (term:chararray); grunt> j1 = join allTerms by term left outer, nonNumbers by term; grunt> allTermsMinusNonNumbers = filter j1 by nonNumbers::term is null; grunt> grunt> dump allTermsMinusNonNumbers (11,) (22,) (33,) (aa,) William F Dowling Sr Technical Specialist, Software Engineering Thomson Reuters Original Message From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of John Conwell Sent: Wednesday, July 06, 2011 6:28 PM To: [EMAIL PROTECTED] Subject: Manually build tuple from three group relations I have a dataset where each tupple is a term. I then do two filter operations, to find all terms that have numbers, then all terms that dont have numbers. Oddly, there are some terms that dont fit into either group (not really sure how). So at this point I have 3 bags, all terms, tems with numbers, and terms without numbers. What I'm trying to find out is what terms are in the list of all terms, but are not in either of the two filtered bags. I thought I'd use the DIFF function, but it only operates on different tuples in the same bag. So somehow I think I need to crate a new relation, that has three tuples at the same level (row?). Then I could use the DIFF function. Any ideas? The script I have so far is shown below... terms = LOAD 'terms' AS (term:chararray, min:float, max:float, count:int); terms = FOREACH terms GENERATE term; bag of all terms allTerms = GROUP terms ALL; bag of terms without numbers nonNumbers = FILTER terms BY NOT (term MATCHES '^.*[09].*$'); nonNumbers = GROUP nonNumbers ALL; bag of terms with numbers withNumbers = FILTER terms BY (term MATCHES '^.*[09].*$'); withNumbers = GROUP withNumbers ALL; 