Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - explode operation


+
Stan Rosenberg 2012-01-26, 03:11
+
Prashant Kommireddi 2012-01-26, 03:19
+
Stan Rosenberg 2012-01-26, 03:26
Copy link to this message
-
Re: explode operation
Stan Rosenberg 2012-01-26, 03:31
To clarify, here is our input:

X = LOAD 'input.txt' AS (id1:chararray, id2:charrarray,
id3:charrarray, id4:chararray, id5:chararray);

We want to compute Y that consists of a single column denoting the set
of all (non-null) ids coming from X.

stan
On Wed, Jan 25, 2012 at 10:26 PM, Stan Rosenberg
<[EMAIL PROTECTED]> wrote:
> I don't see how flatten would help in this case.
>
> On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi
> <[EMAIL PROTECTED]> wrote:
>> Hi Stan,
>>
>> Would using FLATTEN and then DISTINCT work?
>>
>> Thanks,
>> Prashant
>>
>> On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Hi Guys,
>>>
>>> I came across a use case that seems to require an 'explode' operation
>>> which to my knowledge is not currently available.
>>> That is, given a tuple (x,y,z), 'explode' would generate the tuples
>>> (x), (y), (z).
>>>
>>> E.g., consider a relation that contains an arbitrary number of
>>> different identifier columns, say,
>>> social security id, student id, etc.  We want to compute the set of
>>> all distinct identifiers.  Assume that the number of identifier
>>> columns is large and intermingled with other
>>> columns that should be projected out; this is to avoid a solution
>>> using 'SPLIT', e.g.
>>>
>>> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is such
>>> a relation, then the answer we want is
>>> Y={2,3,4,5}.
>>>
>>> Any suggestions?
>>>
>>> Thanks,
>>>
>>> stan
>>>
+
Prashant Kommireddi 2012-01-26, 03:46
+
Jonathan Coveney 2012-01-26, 20:04
+
Stan Rosenberg 2012-01-30, 01:46
+
Aniket Mokashi 2012-01-30, 07:25
+
Stan Rosenberg 2012-01-30, 16:05