Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> How to filter by pig datatype?


+
Lex H 2012-11-22, 01:54
+
Ruslan Al-Fakikh 2012-11-22, 12:11
+
pablomar 2012-11-22, 17:48
Copy link to this message
-
Re: How to filter by pig datatype?
Cheers Pablo.

I was wondering if there was something like this that already existed in
the built-ins, but apparently not.

Mozilla's Akela project seems to have a bunch of useful UDFs, including one
like this, so I might have a look to see if that suits our purpose.

https://github.com/mozilla-metrics/akela

https://github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/pig/filter/map/IsMap.java

Thanks again,

Lexual.
On Fri, Nov 23, 2012 at 4:48 AM, pablomar
<[EMAIL PROTECTED]>wrote:

> did you try with a filter function ?
> something like:
>
> import java.io.IOException;
> import org.apache.pig.FilterFunc;
> import org.apache.pig.data.Tuple;
> import org.apache.pig.impl.util.WrappedIOException;
>
> public class IsMap extends FilterFunc
> {
>   public Boolean exec(Tuple input) throws IOException
>   {
>     if (input == null || input.size() == 0)
>       return null;
>
>     try
>     {
>       return(input.get(0) instanceof java.util.Map);
>     }
>     catch(Exception e)
>     {
>       throw WrappedIOException.wrap("ouch!", e);
>     }
>   }
> }
>
>
> and then:
>
> filtered = FILTER some_data BY IsMap(some_variable);
>
> PS: I didn't try it with your data
>
>
>
> On Wed, Nov 21, 2012 at 8:54 PM, Lex H <[EMAIL PROTECTED]> wrote:
>
> > Attached is a tiny testcase illustrating my problem.
> >
> > What I would like to know is how to filter by Pig datatype.
> > e.g. something like:
> > filtered = FILTER some_data BY some_variable IS_MAP_TYPE;
> >
> > Can anyone advise if this can be accomplished with Pig?
> >
> > We have a field that is sometimes a 'map' sometimes a chararray.
> >
> > Doing something like the following statement fails, presumable because
> > it's trying to a key-value lookup on something that's not a 'map'.
> >
> > -- json#'data' is sometimes a map, sometimes not.
> > trivias = FOREACH data GENERATE json#'data'#'trivia' AS trivia:charray;
> >
> > This has come about from us working with JSON data with Pig via Elephant
> > Bird's JsonLoader.
> >
> > Thanks,
> >
> > Lex.
> >
>
+
pablomar 2012-11-22, 23:19
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB