Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - How to filter by pig datatype?


+
Lex H 2012-11-22, 01:54
+
Ruslan Al-Fakikh 2012-11-22, 12:11
Copy link to this message
-
Re: How to filter by pig datatype?
pablomar 2012-11-22, 17:48
did you try with a filter function ?
something like:

import java.io.IOException;
import org.apache.pig.FilterFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.WrappedIOException;

public class IsMap extends FilterFunc
{
  public Boolean exec(Tuple input) throws IOException
  {
    if (input == null || input.size() == 0)
      return null;

    try
    {
      return(input.get(0) instanceof java.util.Map);
    }
    catch(Exception e)
    {
      throw WrappedIOException.wrap("ouch!", e);
    }
  }
}
and then:

filtered = FILTER some_data BY IsMap(some_variable);

PS: I didn't try it with your data

On Wed, Nov 21, 2012 at 8:54 PM, Lex H <[EMAIL PROTECTED]> wrote:

> Attached is a tiny testcase illustrating my problem.
>
> What I would like to know is how to filter by Pig datatype.
> e.g. something like:
> filtered = FILTER some_data BY some_variable IS_MAP_TYPE;
>
> Can anyone advise if this can be accomplished with Pig?
>
> We have a field that is sometimes a 'map' sometimes a chararray.
>
> Doing something like the following statement fails, presumable because
> it's trying to a key-value lookup on something that's not a 'map'.
>
> -- json#'data' is sometimes a map, sometimes not.
> trivias = FOREACH data GENERATE json#'data'#'trivia' AS trivia:charray;
>
> This has come about from us working with JSON data with Pig via Elephant
> Bird's JsonLoader.
>
> Thanks,
>
> Lex.
>
+
Lex H 2012-11-22, 22:54
+
pablomar 2012-11-22, 23:19