-Re: Splitting by unique values in a relation
Shahab Yunus 2013-09-15, 23:44
Correction in my earlier comment. The following statement that I wrote was
'Won't SPLIT always give you 2 relations?'
It is basically what Praveenesh himself mentioned i.e. a pre-defined/known
number of relations/splits.
On Sun, Sep 15, 2013 at 7:41 PM, praveenesh kumar <[EMAIL PROTECTED]>wrote:
> I can use split only when I am aware of the values by which I need to split
> by... Here customer_ids are unknown to me. I don't know how many of them
> exist in my data. Hence SPLIT is not the answer to my problem.
> Anyways I have found piggybank's MultiStorage method much closer to what I
> am looking for. I was just wondering is there a better or different way to
> do the same.
> On Mon, Sep 16, 2013 at 12:36 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]
> > Hi!
> > Have you tried the SPLIT operator?
> > http://pig.apache.org/docs/r0.11.1/basic.html#SPLIT
> > After splitting the relation into two separate relations you can STORE
> > into different locations.
> > Best Regards,
> > Ruslan Al-Fakikh
> > https://www.odesk.com/users/~015b7b5f617eb89923
> > On Sun, Sep 15, 2013 at 11:03 PM, praveenesh kumar <[EMAIL PROTECTED]
> > >wrote:
> > > Hi,
> > >
> > > I have a relation A with (customer_id, data).
> > > I want to get the unique customer_ids, and spilt them into new
> > > files/relations. What is the most efficient way to do that.
> > >
> > > I can get the distinct customer_ids in a relation. But not able to
> > > understand how can can I use it in splitting the data by customer_id.
> > >
> > > Regards
> > > Praveenesh
> > >