Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Another schema mixup, was: Re: group schema getting wrong fields?


Copy link to this message
-
Another schema mixup, was: Re: group schema getting wrong fields?
This problem went away in 0.10, but has re-appeared in a slightly different
context in the current trunk.
In this script, I have something like
split a into b,c
d = join b, x;
pd = project d;
e = union pd, c;
split e into f,g
h = project f (this is where I get the incorrect fieldname being used
causing an error. )
On Mon, Aug 27, 2012 at 5:14 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> Yeah, I think this is a known issue with filters and relations. Use the
> fix, but I think trunk has the fix.
>
> Thanks
>
> 2012/8/24 Lauren Blau <[EMAIL PROTECTED]>
>
> > actually, if I replace the filters that create the original 2 relations
> > with a split, the problem goes away. (i just saw split used in another
> > message and realized I could use it)
> >
> > On Fri, Aug 24, 2012 at 4:11 PM, Lauren Blau <
> > [EMAIL PROTECTED]> wrote:
> >
> > > fcels and fnot are both filtered from the same original relation.
> > >
> > >
> > > On Fri, Aug 24, 2012 at 4:11 PM, Lauren Blau <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > >> how much more. Here's the cxels:
> > >>
> > >> bigcross = join fcels by (chararray)messageId, fnot by (chararray)
> > >> messageId;
> > >> filt1 = filter bigcross by (int)fcels::astart <= (int)fnot::astart;
> > >> filt2 = filter filt1 by (int)fcels::aend >= (int)fnot::aend;
> > >>
> > >> cxels = foreach filt2 generate fcels::messageId as
> > >> messageId:chararray,fcels::astart as celstart:int,fcels::aend as
> > >> celend:int,fnot::alabel as notcellabel:chararray,fnot::astart as
> > >> notcelstart:int, fnot::aend as notcelend:int;
> > >>
> > >>
> > >> On Fri, Aug 24, 2012 at 3:07 PM, Jonathan Coveney <[EMAIL PROTECTED]
> > >wrote:
> > >>
> > >>> Can you post more of your script?
> > >>>
> > >>> 2012/8/24 Lauren Blau <[EMAIL PROTECTED]>
> > >>>
> > >>> > I'm running pig 0.9.2 and seeing this:
> > >>> >
> > >>> > grunt> describe cxels;
> > >>> > cxels: {messageId: chararray,celstart: int,celend: int,notcellabel:
> > >>> > chararray,notcelstart: int,notcelend: int}
> > >>> > grunt> gcxels = group cxels by (messageId,celstart,celend);
> > >>> > grunt> describe gcxels;
> > >>> > gcxels: {group: (messageId: chararray,notcelstart: int,notcelend:
> > >>> > int),cxels: {(messageId: chararray,celstart: int,celend:
> > >>> int,notcellabel:
> > >>> > chararray,notcelstart: int,notcelend: int)}}
> > >>> >
> > >>> >
> > >>> > why does the schema for gcxels::group show notcelstart and
> notcelend
> > >>> when I
> > >>> > gave it celstart,celend as the grouping fields?
> > >>> > Is the fieldname not being matched correctly?
> > >>> >
> > >>> > Thanks,
> > >>> > lauren
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>