Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - How to filter by pig datatype?


+
Lex H 2012-11-22, 01:54
+
Ruslan Al-Fakikh 2012-11-22, 12:11
+
pablomar 2012-11-22, 17:48
Copy link to this message
-
Re: How to filter by pig datatype?
Lex H 2012-11-22, 22:54
Cheers Pablo.

I was wondering if there was something like this that already existed in
the built-ins, but apparently not.

Mozilla's Akela project seems to have a bunch of useful UDFs, including one
like this, so I might have a look to see if that suits our purpose.

https://github.com/mozilla-metrics/akela

https://github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/pig/filter/map/IsMap.java

Thanks again,

Lexual.
On Fri, Nov 23, 2012 at 4:48 AM, pablomar
<[EMAIL PROTECTED]>wrote:

> did you try with a filter function ?
> something like:
>
> import java.io.IOException;
> import org.apache.pig.FilterFunc;
> import org.apache.pig.data.Tuple;
> import org.apache.pig.impl.util.WrappedIOException;
>
> public class IsMap extends FilterFunc
> {
>   public Boolean exec(Tuple input) throws IOException
>   {
>     if (input == null || input.size() == 0)
>       return null;
>
>     try
>     {
>       return(input.get(0) instanceof java.util.Map);
>     }
>     catch(Exception e)
>     {
>       throw WrappedIOException.wrap("ouch!", e);
>     }
>   }
> }
>
>
> and then:
>
> filtered = FILTER some_data BY IsMap(some_variable);
>
> PS: I didn't try it with your data
>
>
>
> On Wed, Nov 21, 2012 at 8:54 PM, Lex H <[EMAIL PROTECTED]> wrote:
>
> > Attached is a tiny testcase illustrating my problem.
> >
> > What I would like to know is how to filter by Pig datatype.
> > e.g. something like:
> > filtered = FILTER some_data BY some_variable IS_MAP_TYPE;
> >
> > Can anyone advise if this can be accomplished with Pig?
> >
> > We have a field that is sometimes a 'map' sometimes a chararray.
> >
> > Doing something like the following statement fails, presumable because
> > it's trying to a key-value lookup on something that's not a 'map'.
> >
> > -- json#'data' is sometimes a map, sometimes not.
> > trivias = FOREACH data GENERATE json#'data'#'trivia' AS trivia:charray;
> >
> > This has come about from us working with JSON data with Pig via Elephant
> > Bird's JsonLoader.
> >
> > Thanks,
> >
> > Lex.
> >
>
+
pablomar 2012-11-22, 23:19