Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Dividing a bag into smaller bags


Copy link to this message
-
Re: Dividing a bag into smaller bags
Hi Dan,

Thanks for the recommendation. I did manage to use BagSplit, but does
anyone know the best way of accessing the result returned by BagSplit?

BagSplit returns a bag of bags. What is the best pig latin to access a bag
inside another bag?

When I do a describe on what it returned by BagSplit, I get:

{datafu.pig.bags.bagsplit_J_1979: {(data: {(A::group:chararray,A::tagcount:
long)})}}

Thanks,
James

On Wed, Apr 11, 2012 at 5:11 PM, Dan Feldman <[EMAIL PROTECTED]> wrote:

> Hey James,
>
> Have you looked at linkedIn's collection of UDFs, datafu (
>
> http://engineering.linkedin.com/open-source/introducing-datafu-open-source-collection-useful-apache-pig-udfs
> )?
>
> In particular, they have a UDF called BagSplit (
>
> https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/bags/BagSplit.java
> ).
> It might not do exactly what you want since it splits a bag into bags of
> size n, not into 10 equal-sized bags, but it shouldn't be too hard to write
> your own UDF using BagSplit.java as a reference.
>
> Dan F.
>
>
>
> On Wed, Apr 11, 2012 at 8:53 AM, James Newhaven <[EMAIL PROTECTED]
> >wrote:
>
> > Hi,
> >
> > I need to divide a large bag into 10 smaller bags of equal size. Does
> > anyone know of a function that can do this easily? I've had a look at the
> > standard functions and the PiggyBank and can't find anything appropriate.
> >
> > Thanks,
> > James
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB