Stan Rosenberg
20120126, 03:11
Prashant Kommireddi
20120126, 03:19
Stan Rosenberg
20120126, 03:26
Stan Rosenberg
20120126, 03:31
Prashant Kommireddi
20120126, 03:46
Jonathan Coveney
20120126, 20:04
Stan Rosenberg
20120130, 01:46
Aniket Mokashi
20120130, 07:25
Stan Rosenberg
20120130, 16:05


explode operation
Hi Guys,
I came across a use case that seems to require an 'explode' operation which to my knowledge is not currently available. That is, given a tuple (x,y,z), 'explode' would generate the tuples (x), (y), (z). E.g., consider a relation that contains an arbitrary number of different identifier columns, say, social security id, student id, etc. We want to compute the set of all distinct identifiers. Assume that the number of identifier columns is large and intermingled with other columns that should be projected out; this is to avoid a solution using 'SPLIT', e.g. To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is such a relation, then the answer we want is Y={2,3,4,5}. Any suggestions? Thanks, stan +
Stan Rosenberg 20120126, 03:11

Re: explode operation
Hi Stan,
Would using FLATTEN and then DISTINCT work? Thanks, Prashant On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg < [EMAIL PROTECTED]> wrote: > Hi Guys, > > I came across a use case that seems to require an 'explode' operation > which to my knowledge is not currently available. > That is, given a tuple (x,y,z), 'explode' would generate the tuples > (x), (y), (z). > > E.g., consider a relation that contains an arbitrary number of > different identifier columns, say, > social security id, student id, etc. We want to compute the set of > all distinct identifiers. Assume that the number of identifier > columns is large and intermingled with other > columns that should be projected out; this is to avoid a solution > using 'SPLIT', e.g. > > To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is such > a relation, then the answer we want is > Y={2,3,4,5}. > > Any suggestions? > > Thanks, > > stan > +
Prashant Kommireddi 20120126, 03:19

Re: explode operation
I don't see how flatten would help in this case.
On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi <[EMAIL PROTECTED]> wrote: > Hi Stan, > > Would using FLATTEN and then DISTINCT work? > > Thanks, > Prashant > > On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg < > [EMAIL PROTECTED]> wrote: > >> Hi Guys, >> >> I came across a use case that seems to require an 'explode' operation >> which to my knowledge is not currently available. >> That is, given a tuple (x,y,z), 'explode' would generate the tuples >> (x), (y), (z). >> >> E.g., consider a relation that contains an arbitrary number of >> different identifier columns, say, >> social security id, student id, etc. We want to compute the set of >> all distinct identifiers. Assume that the number of identifier >> columns is large and intermingled with other >> columns that should be projected out; this is to avoid a solution >> using 'SPLIT', e.g. >> >> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is such >> a relation, then the answer we want is >> Y={2,3,4,5}. >> >> Any suggestions? >> >> Thanks, >> >> stan >> +
Stan Rosenberg 20120126, 03:26

Re: explode operation
To clarify, here is our input:
X = LOAD 'input.txt' AS (id1:chararray, id2:charrarray, id3:charrarray, id4:chararray, id5:chararray); We want to compute Y that consists of a single column denoting the set of all (nonnull) ids coming from X. stan On Wed, Jan 25, 2012 at 10:26 PM, Stan Rosenberg <[EMAIL PROTECTED]> wrote: > I don't see how flatten would help in this case. > > On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi > <[EMAIL PROTECTED]> wrote: >> Hi Stan, >> >> Would using FLATTEN and then DISTINCT work? >> >> Thanks, >> Prashant >> >> On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg < >> [EMAIL PROTECTED]> wrote: >> >>> Hi Guys, >>> >>> I came across a use case that seems to require an 'explode' operation >>> which to my knowledge is not currently available. >>> That is, given a tuple (x,y,z), 'explode' would generate the tuples >>> (x), (y), (z). >>> >>> E.g., consider a relation that contains an arbitrary number of >>> different identifier columns, say, >>> social security id, student id, etc. We want to compute the set of >>> all distinct identifiers. Assume that the number of identifier >>> columns is large and intermingled with other >>> columns that should be projected out; this is to avoid a solution >>> using 'SPLIT', e.g. >>> >>> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is such >>> a relation, then the answer we want is >>> Y={2,3,4,5}. >>> >>> Any suggestions? >>> >>> Thanks, >>> >>> stan >>> +
Stan Rosenberg 20120126, 03:31

Re: explode operation
Sorry I misunderstood your initial question. You would have to write a
custom UDF to do this. Thanks, Prashant On Jan 25, 2012, at 7:32 PM, Stan Rosenberg <[EMAIL PROTECTED]> wrote: > To clarify, here is our input: > > X = LOAD 'input.txt' AS (id1:chararray, id2:charrarray, > id3:charrarray, id4:chararray, id5:chararray); > > We want to compute Y that consists of a single column denoting the set > of all (nonnull) ids coming from X. > > stan > > > On Wed, Jan 25, 2012 at 10:26 PM, Stan Rosenberg > <[EMAIL PROTECTED]> wrote: >> I don't see how flatten would help in this case. >> >> On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi >> <[EMAIL PROTECTED]> wrote: >>> Hi Stan, >>> >>> Would using FLATTEN and then DISTINCT work? >>> >>> Thanks, >>> Prashant >>> >>> On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg < >>> [EMAIL PROTECTED]> wrote: >>> >>>> Hi Guys, >>>> >>>> I came across a use case that seems to require an 'explode' operation >>>> which to my knowledge is not currently available. >>>> That is, given a tuple (x,y,z), 'explode' would generate the tuples >>>> (x), (y), (z). >>>> >>>> E.g., consider a relation that contains an arbitrary number of >>>> different identifier columns, say, >>>> social security id, student id, etc. We want to compute the set of >>>> all distinct identifiers. Assume that the number of identifier >>>> columns is large and intermingled with other >>>> columns that should be projected out; this is to avoid a solution >>>> using 'SPLIT', e.g. >>>> >>>> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is such >>>> a relation, then the answer we want is >>>> Y={2,3,4,5}. >>>> >>>> Any suggestions? >>>> >>>> Thanks, >>>> >>>> stan >>>> +
Prashant Kommireddi 20120126, 03:46

Re: explode operation
I think this might give you what you want
X = LOAD 'input.txt' using PigStorage(',') AS (id1:chararray, id2:chararray, id3:chararray, id4:chararray, id5:chararray); Y_0 = foreach X generate FLATTEN(TOBAG(*)); Y = filter Y_0 by $0 is not null; 2012/1/25 Prashant Kommireddi <[EMAIL PROTECTED]> > Sorry I misunderstood your initial question. You would have to write a > custom UDF to do this. > > Thanks, > Prashant > > On Jan 25, 2012, at 7:32 PM, Stan Rosenberg > <[EMAIL PROTECTED]> wrote: > > > To clarify, here is our input: > > > > X = LOAD 'input.txt' AS (id1:chararray, id2:charrarray, > > id3:charrarray, id4:chararray, id5:chararray); > > > > We want to compute Y that consists of a single column denoting the set > > of all (nonnull) ids coming from X. > > > > stan > > > > > > On Wed, Jan 25, 2012 at 10:26 PM, Stan Rosenberg > > <[EMAIL PROTECTED]> wrote: > >> I don't see how flatten would help in this case. > >> > >> On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi > >> <[EMAIL PROTECTED]> wrote: > >>> Hi Stan, > >>> > >>> Would using FLATTEN and then DISTINCT work? > >>> > >>> Thanks, > >>> Prashant > >>> > >>> On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg < > >>> [EMAIL PROTECTED]> wrote: > >>> > >>>> Hi Guys, > >>>> > >>>> I came across a use case that seems to require an 'explode' operation > >>>> which to my knowledge is not currently available. > >>>> That is, given a tuple (x,y,z), 'explode' would generate the tuples > >>>> (x), (y), (z). > >>>> > >>>> E.g., consider a relation that contains an arbitrary number of > >>>> different identifier columns, say, > >>>> social security id, student id, etc. We want to compute the set of > >>>> all distinct identifiers. Assume that the number of identifier > >>>> columns is large and intermingled with other > >>>> columns that should be projected out; this is to avoid a solution > >>>> using 'SPLIT', e.g. > >>>> > >>>> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is such > >>>> a relation, then the answer we want is > >>>> Y={2,3,4,5}. > >>>> > >>>> Any suggestions? > >>>> > >>>> Thanks, > >>>> > >>>> stan > >>>> > +
Jonathan Coveney 20120126, 20:04

Re: explode operation
Hi Jonathan,
What you recommended below is not quite right. The right solution would need to do something similar to 'explode'. Thanks, stan On Thu, Jan 26, 2012 at 3:04 PM, Jonathan Coveney <[EMAIL PROTECTED]> wrote: > I think this might give you what you want > > X = LOAD 'input.txt' using PigStorage(',') AS (id1:chararray, > id2:chararray, id3:chararray, id4:chararray, id5:chararray); > Y_0 = foreach X generate FLATTEN(TOBAG(*)); > Y = filter Y_0 by $0 is not null; > > 2012/1/25 Prashant Kommireddi <[EMAIL PROTECTED]> > >> Sorry I misunderstood your initial question. You would have to write a >> custom UDF to do this. >> >> Thanks, >> Prashant >> >> On Jan 25, 2012, at 7:32 PM, Stan Rosenberg >> <[EMAIL PROTECTED]> wrote: >> >> > To clarify, here is our input: >> > >> > X = LOAD 'input.txt' AS (id1:chararray, id2:charrarray, >> > id3:charrarray, id4:chararray, id5:chararray); >> > >> > We want to compute Y that consists of a single column denoting the set >> > of all (nonnull) ids coming from X. >> > >> > stan >> > >> > >> > On Wed, Jan 25, 2012 at 10:26 PM, Stan Rosenberg >> > <[EMAIL PROTECTED]> wrote: >> >> I don't see how flatten would help in this case. >> >> >> >> On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi >> >> <[EMAIL PROTECTED]> wrote: >> >>> Hi Stan, >> >>> >> >>> Would using FLATTEN and then DISTINCT work? >> >>> >> >>> Thanks, >> >>> Prashant >> >>> >> >>> On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg < >> >>> [EMAIL PROTECTED]> wrote: >> >>> >> >>>> Hi Guys, >> >>>> >> >>>> I came across a use case that seems to require an 'explode' operation >> >>>> which to my knowledge is not currently available. >> >>>> That is, given a tuple (x,y,z), 'explode' would generate the tuples >> >>>> (x), (y), (z). >> >>>> >> >>>> E.g., consider a relation that contains an arbitrary number of >> >>>> different identifier columns, say, >> >>>> social security id, student id, etc. We want to compute the set of >> >>>> all distinct identifiers. Assume that the number of identifier >> >>>> columns is large and intermingled with other >> >>>> columns that should be projected out; this is to avoid a solution >> >>>> using 'SPLIT', e.g. >> >>>> >> >>>> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is such >> >>>> a relation, then the answer we want is >> >>>> Y={2,3,4,5}. >> >>>> >> >>>> Any suggestions? >> >>>> >> >>>> Thanks, >> >>>> >> >>>> stan >> >>>> >> +
Stan Rosenberg 20120130, 01:46

Re: explode operation
Isnt FLATTEN similar to explode?
On Sun, Jan 29, 2012 at 5:46 PM, Stan Rosenberg < [EMAIL PROTECTED]> wrote: > Hi Jonathan, > > What you recommended below is not quite right. The right solution > would need to do something similar to 'explode'. > > Thanks, > > stan > > On Thu, Jan 26, 2012 at 3:04 PM, Jonathan Coveney <[EMAIL PROTECTED]> > wrote: > > I think this might give you what you want > > > > X = LOAD 'input.txt' using PigStorage(',') AS (id1:chararray, > > id2:chararray, id3:chararray, id4:chararray, id5:chararray); > > Y_0 = foreach X generate FLATTEN(TOBAG(*)); > > Y = filter Y_0 by $0 is not null; > > > > 2012/1/25 Prashant Kommireddi <[EMAIL PROTECTED]> > > > >> Sorry I misunderstood your initial question. You would have to write a > >> custom UDF to do this. > >> > >> Thanks, > >> Prashant > >> > >> On Jan 25, 2012, at 7:32 PM, Stan Rosenberg > >> <[EMAIL PROTECTED]> wrote: > >> > >> > To clarify, here is our input: > >> > > >> > X = LOAD 'input.txt' AS (id1:chararray, id2:charrarray, > >> > id3:charrarray, id4:chararray, id5:chararray); > >> > > >> > We want to compute Y that consists of a single column denoting the set > >> > of all (nonnull) ids coming from X. > >> > > >> > stan > >> > > >> > > >> > On Wed, Jan 25, 2012 at 10:26 PM, Stan Rosenberg > >> > <[EMAIL PROTECTED]> wrote: > >> >> I don't see how flatten would help in this case. > >> >> > >> >> On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi > >> >> <[EMAIL PROTECTED]> wrote: > >> >>> Hi Stan, > >> >>> > >> >>> Would using FLATTEN and then DISTINCT work? > >> >>> > >> >>> Thanks, > >> >>> Prashant > >> >>> > >> >>> On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg < > >> >>> [EMAIL PROTECTED]> wrote: > >> >>> > >> >>>> Hi Guys, > >> >>>> > >> >>>> I came across a use case that seems to require an 'explode' > operation > >> >>>> which to my knowledge is not currently available. > >> >>>> That is, given a tuple (x,y,z), 'explode' would generate the tuples > >> >>>> (x), (y), (z). > >> >>>> > >> >>>> E.g., consider a relation that contains an arbitrary number of > >> >>>> different identifier columns, say, > >> >>>> social security id, student id, etc. We want to compute the set of > >> >>>> all distinct identifiers. Assume that the number of identifier > >> >>>> columns is large and intermingled with other > >> >>>> columns that should be projected out; this is to avoid a solution > >> >>>> using 'SPLIT', e.g. > >> >>>> > >> >>>> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is > such > >> >>>> a relation, then the answer we want is > >> >>>> Y={2,3,4,5}. > >> >>>> > >> >>>> Any suggestions? > >> >>>> > >> >>>> Thanks, > >> >>>> > >> >>>> stan > >> >>>> > >> >  "...:::Aniket:::... Quetzalco@tl" +
Aniket Mokashi 20120130, 07:25

Re: explode operation
On Mon, Jan 30, 2012 at 2:25 AM, Aniket Mokashi <[EMAIL PROTECTED]> wrote:
> Isnt FLATTEN similar to explode? Not quite. EXPLODE would take a record with n fields and generate n records. +
Stan Rosenberg 20120130, 16:05

