Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - concatenating tuples into one tuple?


Copy link to this message
-
RE: concatenating tuples into one tuple?
Steve Bernstein 2013-04-30, 19:11
[FORMATTING correction, apologies]

Here's one sloppy solution:

rmf temp;

STORE a INTO 'temp';

--load the bag as a chararray and morph it to my will

new = LOAD 'temp' USING PigStorage() AS (
id: chararray,
bitmap: chararray
);

-- remove all the {()} and strong split into a tuple on the commas

i = FOREACH new GENERATE
id,
STRSPLIT( REPLACE(bitmap,'[\\{\\(\\)\\} ]',''),
',', 99999) AS bitmap
;

So this works, but it's actually supposed to be part of a macro (new for us, and I didn't try yet, but the doc says we can't execute grunt shell commands in a Macro, so we wouldn't be able to "rmf temp";)

Still seems like I'm missing something on how to dereference the elements to get what I want directly.
Steve
-----Original Message-----

I have a post-grouping relation:

a =  { id: chararray, bitmap{ (value_binary: int) } },

where the value_binary tuples are single-element tuples that have been sorted--the order of the single-element tuples is important.  All the "bitmap" bags are guaranteed to have the same number of single element tuples, but that number is arbitrary.  That is, I can't depend in advance on knowing how many tuples there will be in "bitmap", but I can depend on each bitmap having the same number of tuples.  An example of an instance with 5 tuples:

9    {(1),(0),(0),(0),(0)}

Would need to become:

9   {(1,0,0,0,0)}

...concatenating those tuples into one tuple, preserving the order, again without having advance knowledge of how many tuples will be in "bitmap".  I can't figure out how to do it.

Thanks in advance for any suggestions...
Steve