Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> explode operation


Copy link to this message
-
Re: explode operation
To clarify, here is our input:

X = LOAD 'input.txt' AS (id1:chararray, id2:charrarray,
id3:charrarray, id4:chararray, id5:chararray);

We want to compute Y that consists of a single column denoting the set
of all (non-null) ids coming from X.

stan
On Wed, Jan 25, 2012 at 10:26 PM, Stan Rosenberg
<[EMAIL PROTECTED]> wrote:
> I don't see how flatten would help in this case.
>
> On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi
> <[EMAIL PROTECTED]> wrote:
>> Hi Stan,
>>
>> Would using FLATTEN and then DISTINCT work?
>>
>> Thanks,
>> Prashant
>>
>> On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Hi Guys,
>>>
>>> I came across a use case that seems to require an 'explode' operation
>>> which to my knowledge is not currently available.
>>> That is, given a tuple (x,y,z), 'explode' would generate the tuples
>>> (x), (y), (z).
>>>
>>> E.g., consider a relation that contains an arbitrary number of
>>> different identifier columns, say,
>>> social security id, student id, etc.  We want to compute the set of
>>> all distinct identifiers.  Assume that the number of identifier
>>> columns is large and intermingled with other
>>> columns that should be projected out; this is to avoid a solution
>>> using 'SPLIT', e.g.
>>>
>>> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is such
>>> a relation, then the answer we want is
>>> Y={2,3,4,5}.
>>>
>>> Any suggestions?
>>>
>>> Thanks,
>>>
>>> stan
>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB