Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> explode operation


+
Stan Rosenberg 2012-01-26, 03:11
+
Prashant Kommireddi 2012-01-26, 03:19
+
Stan Rosenberg 2012-01-26, 03:26
+
Stan Rosenberg 2012-01-26, 03:31
+
Prashant Kommireddi 2012-01-26, 03:46
+
Jonathan Coveney 2012-01-26, 20:04
+
Stan Rosenberg 2012-01-30, 01:46
Copy link to this message
-
Re: explode operation
Isnt FLATTEN similar to explode?

On Sun, Jan 29, 2012 at 5:46 PM, Stan Rosenberg <
[EMAIL PROTECTED]> wrote:

> Hi Jonathan,
>
> What you recommended below is not quite right.  The right solution
> would need to do something similar to 'explode'.
>
> Thanks,
>
> stan
>
> On Thu, Jan 26, 2012 at 3:04 PM, Jonathan Coveney <[EMAIL PROTECTED]>
> wrote:
> > I think this might give you what you want
> >
> > X = LOAD 'input.txt' using PigStorage(',') AS (id1:chararray,
> > id2:chararray, id3:chararray, id4:chararray, id5:chararray);
> > Y_0 = foreach X generate FLATTEN(TOBAG(*));
> > Y = filter Y_0 by $0 is not null;
> >
> > 2012/1/25 Prashant Kommireddi <[EMAIL PROTECTED]>
> >
> >> Sorry I misunderstood your initial question. You would have to write a
> >> custom UDF to do this.
> >>
> >> Thanks,
> >> Prashant
> >>
> >> On Jan 25, 2012, at 7:32 PM, Stan Rosenberg
> >> <[EMAIL PROTECTED]> wrote:
> >>
> >> > To clarify, here is our input:
> >> >
> >> > X = LOAD 'input.txt' AS (id1:chararray, id2:charrarray,
> >> > id3:charrarray, id4:chararray, id5:chararray);
> >> >
> >> > We want to compute Y that consists of a single column denoting the set
> >> > of all (non-null) ids coming from X.
> >> >
> >> > stan
> >> >
> >> >
> >> > On Wed, Jan 25, 2012 at 10:26 PM, Stan Rosenberg
> >> > <[EMAIL PROTECTED]> wrote:
> >> >> I don't see how flatten would help in this case.
> >> >>
> >> >> On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi
> >> >> <[EMAIL PROTECTED]> wrote:
> >> >>> Hi Stan,
> >> >>>
> >> >>> Would using FLATTEN and then DISTINCT work?
> >> >>>
> >> >>> Thanks,
> >> >>> Prashant
> >> >>>
> >> >>> On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg <
> >> >>> [EMAIL PROTECTED]> wrote:
> >> >>>
> >> >>>> Hi Guys,
> >> >>>>
> >> >>>> I came across a use case that seems to require an 'explode'
> operation
> >> >>>> which to my knowledge is not currently available.
> >> >>>> That is, given a tuple (x,y,z), 'explode' would generate the tuples
> >> >>>> (x), (y), (z).
> >> >>>>
> >> >>>> E.g., consider a relation that contains an arbitrary number of
> >> >>>> different identifier columns, say,
> >> >>>> social security id, student id, etc.  We want to compute the set of
> >> >>>> all distinct identifiers.  Assume that the number of identifier
> >> >>>> columns is large and intermingled with other
> >> >>>> columns that should be projected out; this is to avoid a solution
> >> >>>> using 'SPLIT', e.g.
> >> >>>>
> >> >>>> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is
> such
> >> >>>> a relation, then the answer we want is
> >> >>>> Y={2,3,4,5}.
> >> >>>>
> >> >>>> Any suggestions?
> >> >>>>
> >> >>>> Thanks,
> >> >>>>
> >> >>>> stan
> >> >>>>
> >>
>

--
"...:::Aniket:::... Quetzalco@tl"
+
Stan Rosenberg 2012-01-30, 16:05