|
|
-
Re: How to filter by pig datatype?Russell Jurney 2012-11-23, 02:14
This should be a builtin: String TypeOf(Object) { }
B = filter A by TypeOf(foo.bar) == 'chararray'; The evalfunc can check the schema and try to cast the value? Russell Jurney twitter.com/rjurney On Nov 22, 2012, at 6:20 PM, pablomar <[EMAIL PROTECTED]> wrote: > I'm stupid, I didn't know about akela > thanks for the info ! > > > On Thu, Nov 22, 2012 at 5:54 PM, Lex H <[EMAIL PROTECTED]> wrote: > >> Cheers Pablo. >> >> I was wondering if there was something like this that already existed in >> the built-ins, but apparently not. >> >> Mozilla's Akela project seems to have a bunch of useful UDFs, including one >> like this, so I might have a look to see if that suits our purpose. >> >> https://github.com/mozilla-metrics/akela >> >> >> https://github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/pig/filter/map/IsMap.java >> >> Thanks again, >> >> Lexual. >> >> >> On Fri, Nov 23, 2012 at 4:48 AM, pablomar >> <[EMAIL PROTECTED]>wrote: >> >>> did you try with a filter function ? >>> something like: >>> >>> import java.io.IOException; >>> import org.apache.pig.FilterFunc; >>> import org.apache.pig.data.Tuple; >>> import org.apache.pig.impl.util.WrappedIOException; >>> >>> public class IsMap extends FilterFunc >>> { >>> public Boolean exec(Tuple input) throws IOException >>> { >>> if (input == null || input.size() == 0) >>> return null; >>> >>> try >>> { >>> return(input.get(0) instanceof java.util.Map); >>> } >>> catch(Exception e) >>> { >>> throw WrappedIOException.wrap("ouch!", e); >>> } >>> } >>> } >>> >>> >>> and then: >>> >>> filtered = FILTER some_data BY IsMap(some_variable); >>> >>> PS: I didn't try it with your data >>> >>> >>> >>> On Wed, Nov 21, 2012 at 8:54 PM, Lex H <[EMAIL PROTECTED]> wrote: >>> >>>> Attached is a tiny testcase illustrating my problem. >>>> >>>> What I would like to know is how to filter by Pig datatype. >>>> e.g. something like: >>>> filtered = FILTER some_data BY some_variable IS_MAP_TYPE; >>>> >>>> Can anyone advise if this can be accomplished with Pig? >>>> >>>> We have a field that is sometimes a 'map' sometimes a chararray. >>>> >>>> Doing something like the following statement fails, presumable because >>>> it's trying to a key-value lookup on something that's not a 'map'. >>>> >>>> -- json#'data' is sometimes a map, sometimes not. >>>> trivias = FOREACH data GENERATE json#'data'#'trivia' AS trivia:charray; >>>> >>>> This has come about from us working with JSON data with Pig via >> Elephant >>>> Bird's JsonLoader. >>>> >>>> Thanks, >>>> >>>> Lex. >>>> >>> >> |