Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How to filter by pig datatype?


Copy link to this message
-
Re: How to filter by pig datatype?
I'm stupid, I didn't know about akela
thanks for the info !
On Thu, Nov 22, 2012 at 5:54 PM, Lex H <[EMAIL PROTECTED]> wrote:

> Cheers Pablo.
>
> I was wondering if there was something like this that already existed in
> the built-ins, but apparently not.
>
> Mozilla's Akela project seems to have a bunch of useful UDFs, including one
> like this, so I might have a look to see if that suits our purpose.
>
> https://github.com/mozilla-metrics/akela
>
>
> https://github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/pig/filter/map/IsMap.java
>
> Thanks again,
>
> Lexual.
>
>
> On Fri, Nov 23, 2012 at 4:48 AM, pablomar
> <[EMAIL PROTECTED]>wrote:
>
> > did you try with a filter function ?
> > something like:
> >
> > import java.io.IOException;
> > import org.apache.pig.FilterFunc;
> > import org.apache.pig.data.Tuple;
> > import org.apache.pig.impl.util.WrappedIOException;
> >
> > public class IsMap extends FilterFunc
> > {
> >   public Boolean exec(Tuple input) throws IOException
> >   {
> >     if (input == null || input.size() == 0)
> >       return null;
> >
> >     try
> >     {
> >       return(input.get(0) instanceof java.util.Map);
> >     }
> >     catch(Exception e)
> >     {
> >       throw WrappedIOException.wrap("ouch!", e);
> >     }
> >   }
> > }
> >
> >
> > and then:
> >
> > filtered = FILTER some_data BY IsMap(some_variable);
> >
> > PS: I didn't try it with your data
> >
> >
> >
> > On Wed, Nov 21, 2012 at 8:54 PM, Lex H <[EMAIL PROTECTED]> wrote:
> >
> > > Attached is a tiny testcase illustrating my problem.
> > >
> > > What I would like to know is how to filter by Pig datatype.
> > > e.g. something like:
> > > filtered = FILTER some_data BY some_variable IS_MAP_TYPE;
> > >
> > > Can anyone advise if this can be accomplished with Pig?
> > >
> > > We have a field that is sometimes a 'map' sometimes a chararray.
> > >
> > > Doing something like the following statement fails, presumable because
> > > it's trying to a key-value lookup on something that's not a 'map'.
> > >
> > > -- json#'data' is sometimes a map, sometimes not.
> > > trivias = FOREACH data GENERATE json#'data'#'trivia' AS trivia:charray;
> > >
> > > This has come about from us working with JSON data with Pig via
> Elephant
> > > Bird's JsonLoader.
> > >
> > > Thanks,
> > >
> > > Lex.
> > >
> >
>