Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Splitting by unique values in a relation


Copy link to this message
-
Re: Splitting by unique values in a relation
Correction in my earlier comment. The following statement that I wrote was
wrong:
'Won't SPLIT always give you 2 relations?'

It is basically what Praveenesh himself mentioned i.e. a pre-defined/known
number of relations/splits.

Regards,
Shahab
On Sun, Sep 15, 2013 at 7:41 PM, praveenesh kumar <[EMAIL PROTECTED]>wrote:

> I can use split only when I am aware of the values by which I need to split
> by... Here customer_ids are unknown to me. I don't know how many of them
> exist in my data. Hence SPLIT is not the answer to my problem.
>
> Anyways I have found piggybank's MultiStorage method much closer to what I
> am looking for. I was just wondering is there a better or different way to
> do the same.
>
> Regards
> Praveenesh
>
>
> On Mon, Sep 16, 2013 at 12:36 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]
> >wrote:
>
> > Hi!
> >
> > Have you tried the SPLIT operator?
> > http://pig.apache.org/docs/r0.11.1/basic.html#SPLIT
> > After splitting the relation into two separate relations you can STORE
> them
> > into different locations.
> >
> > Best Regards,
> > Ruslan Al-Fakikh
> > https://www.odesk.com/users/~015b7b5f617eb89923
> >
> >
> > On Sun, Sep 15, 2013 at 11:03 PM, praveenesh kumar <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi,
> > >
> > > I have a relation A with (customer_id, data).
> > > I want to get the unique customer_ids, and spilt them into new
> > > files/relations. What is the most efficient way to do that.
> > >
> > > I can get the distinct customer_ids in a relation. But not able to
> > > understand how can can I use it in splitting the data by customer_id.
> > >
> > > Regards
> > > Praveenesh
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB