Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Passing a single Bag to an Eval function


Copy link to this message
-
Re: Passing a single Bag to an Eval function
James Newhaven 2012-05-28, 22:16
Thanks Jonathan. Great explanation.

On Mon, May 28, 2012 at 6:53 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> Howdy James. It's important to remember that relations and bags are not the
> same (though they feel pretty similar). EvalFuncs can never be run directly
> on a relation, only on a bag. In this case, you need to group first.
>
> -- F: {id: chararray,countd: long}
>
> G = group F all;
>
> H = FOREACH G GENERATE BagSplit(F);
>
> "group all" is what lets you effectively turn a relation directly into a
> bag. describe G and you'll see that now you have a relation of one row,
> where that row is the catchall key "all," and then a bag named "F" that is
> the relation you wanted to run BagSplit on.
>
> 2012/5/28 James Newhaven <[EMAIL PROTECTED]>
>
> > I am trying to use an EVAL pig function (it's called BagSplit from
> datafu)
> > which accepts a Bag as a parameter.
> >
> > The problem I have is that my current relation is a single Bag, so I'm
> not
> > sure how to pass this relation to the Eval function.
> >
> > My pig script looks like this:
> >
> > F = FOREACH E GENERATE $0,$1;
> >
> > DESCRIBE F;
> >
> > F: {id: chararray,countd: long}
> >
> > G = FOREACH F GENERATE BagSplit(??);
> >
> > Not sure what I can put in ??
> >
> > Thanks,
> > James
> >
>