Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Re: How to filter by pig datatype?


Copy link to this message
-
Re: How to filter by pig datatype?
This should be a builtin: String TypeOf(Object) {  }

B = filter A by TypeOf(foo.bar) == 'chararray';

The evalfunc can check the schema and try to cast the value?

Russell Jurney twitter.com/rjurney
On Nov 22, 2012, at 6:20 PM, pablomar <[EMAIL PROTECTED]> wrote:

> I'm stupid, I didn't know about akela
> thanks for the info !
>
>
> On Thu, Nov 22, 2012 at 5:54 PM, Lex H <[EMAIL PROTECTED]> wrote:
>
>> Cheers Pablo.
>>
>> I was wondering if there was something like this that already existed in
>> the built-ins, but apparently not.
>>
>> Mozilla's Akela project seems to have a bunch of useful UDFs, including one
>> like this, so I might have a look to see if that suits our purpose.
>>
>> https://github.com/mozilla-metrics/akela
>>
>>
>> https://github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/pig/filter/map/IsMap.java
>>
>> Thanks again,
>>
>> Lexual.
>>
>>
>> On Fri, Nov 23, 2012 at 4:48 AM, pablomar
>> <[EMAIL PROTECTED]>wrote:
>>
>>> did you try with a filter function ?
>>> something like:
>>>
>>> import java.io.IOException;
>>> import org.apache.pig.FilterFunc;
>>> import org.apache.pig.data.Tuple;
>>> import org.apache.pig.impl.util.WrappedIOException;
>>>
>>> public class IsMap extends FilterFunc
>>> {
>>>  public Boolean exec(Tuple input) throws IOException
>>>  {
>>>    if (input == null || input.size() == 0)
>>>      return null;
>>>
>>>    try
>>>    {
>>>      return(input.get(0) instanceof java.util.Map);
>>>    }
>>>    catch(Exception e)
>>>    {
>>>      throw WrappedIOException.wrap("ouch!", e);
>>>    }
>>>  }
>>> }
>>>
>>>
>>> and then:
>>>
>>> filtered = FILTER some_data BY IsMap(some_variable);
>>>
>>> PS: I didn't try it with your data
>>>
>>>
>>>
>>> On Wed, Nov 21, 2012 at 8:54 PM, Lex H <[EMAIL PROTECTED]> wrote:
>>>
>>>> Attached is a tiny testcase illustrating my problem.
>>>>
>>>> What I would like to know is how to filter by Pig datatype.
>>>> e.g. something like:
>>>> filtered = FILTER some_data BY some_variable IS_MAP_TYPE;
>>>>
>>>> Can anyone advise if this can be accomplished with Pig?
>>>>
>>>> We have a field that is sometimes a 'map' sometimes a chararray.
>>>>
>>>> Doing something like the following statement fails, presumable because
>>>> it's trying to a key-value lookup on something that's not a 'map'.
>>>>
>>>> -- json#'data' is sometimes a map, sometimes not.
>>>> trivias = FOREACH data GENERATE json#'data'#'trivia' AS trivia:charray;
>>>>
>>>> This has come about from us working with JSON data with Pig via
>> Elephant
>>>> Bird's JsonLoader.
>>>>
>>>> Thanks,
>>>>
>>>> Lex.
>>>>
>>>
>>