Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Dividing a bag into smaller bags


Copy link to this message
-
Re: Dividing a bag into smaller bags
Hi Dan,

Thanks for the recommendation. I did manage to use BagSplit, but does
anyone know the best way of accessing the result returned by BagSplit?

BagSplit returns a bag of bags. What is the best pig latin to access a bag
inside another bag?

When I do a describe on what it returned by BagSplit, I get:

{datafu.pig.bags.bagsplit_J_1979: {(data: {(A::group:chararray,A::tagcount:
long)})}}

Thanks,
James

On Wed, Apr 11, 2012 at 5:11 PM, Dan Feldman <[EMAIL PROTECTED]> wrote:

> Hey James,
>
> Have you looked at linkedIn's collection of UDFs, datafu (
>
> http://engineering.linkedin.com/open-source/introducing-datafu-open-source-collection-useful-apache-pig-udfs
> )?
>
> In particular, they have a UDF called BagSplit (
>
> https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/bags/BagSplit.java
> ).
> It might not do exactly what you want since it splits a bag into bags of
> size n, not into 10 equal-sized bags, but it shouldn't be too hard to write
> your own UDF using BagSplit.java as a reference.
>
> Dan F.
>
>
>
> On Wed, Apr 11, 2012 at 8:53 AM, James Newhaven <[EMAIL PROTECTED]
> >wrote:
>
> > Hi,
> >
> > I need to divide a large bag into 10 smaller bags of equal size. Does
> > anyone know of a function that can do this easily? I've had a look at the
> > standard functions and the PiggyBank and can't find anything appropriate.
> >
> > Thanks,
> > James
> >
>