|
|
-
Re: Dividing a bag into smaller bagsJames Newhaven 2012-04-11, 16:21
Hi Dan,
Thanks for the recommendation. I did manage to use BagSplit, but does anyone know the best way of accessing the result returned by BagSplit? BagSplit returns a bag of bags. What is the best pig latin to access a bag inside another bag? When I do a describe on what it returned by BagSplit, I get: {datafu.pig.bags.bagsplit_J_1979: {(data: {(A::group:chararray,A::tagcount: long)})}} Thanks, James On Wed, Apr 11, 2012 at 5:11 PM, Dan Feldman <[EMAIL PROTECTED]> wrote: > Hey James, > > Have you looked at linkedIn's collection of UDFs, datafu ( > > http://engineering.linkedin.com/open-source/introducing-datafu-open-source-collection-useful-apache-pig-udfs > )? > > In particular, they have a UDF called BagSplit ( > > https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/bags/BagSplit.java > ). > It might not do exactly what you want since it splits a bag into bags of > size n, not into 10 equal-sized bags, but it shouldn't be too hard to write > your own UDF using BagSplit.java as a reference. > > Dan F. > > > > On Wed, Apr 11, 2012 at 8:53 AM, James Newhaven <[EMAIL PROTECTED] > >wrote: > > > Hi, > > > > I need to divide a large bag into 10 smaller bags of equal size. Does > > anyone know of a function that can do this easily? I've had a look at the > > standard functions and the PiggyBank and can't find anything appropriate. > > > > Thanks, > > James > > > |