Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> concatenating tuples into one tuple?


Copy link to this message
-
RE: concatenating tuples into one tuple?
[FORMATTING correction, apologies]

Here's one sloppy solution:

rmf temp;

STORE a INTO 'temp';

--load the bag as a chararray and morph it to my will

new = LOAD 'temp' USING PigStorage() AS (
id: chararray,
bitmap: chararray
);

-- remove all the {()} and strong split into a tuple on the commas

i = FOREACH new GENERATE
id,
STRSPLIT( REPLACE(bitmap,'[\\{\\(\\)\\} ]',''),
',', 99999) AS bitmap
;

So this works, but it's actually supposed to be part of a macro (new for us, and I didn't try yet, but the doc says we can't execute grunt shell commands in a Macro, so we wouldn't be able to "rmf temp";)

Still seems like I'm missing something on how to dereference the elements to get what I want directly.
Steve
-----Original Message-----

I have a post-grouping relation:

a =  { id: chararray, bitmap{ (value_binary: int) } },

where the value_binary tuples are single-element tuples that have been sorted--the order of the single-element tuples is important.  All the "bitmap" bags are guaranteed to have the same number of single element tuples, but that number is arbitrary.  That is, I can't depend in advance on knowing how many tuples there will be in "bitmap", but I can depend on each bitmap having the same number of tuples.  An example of an instance with 5 tuples:

9    {(1),(0),(0),(0),(0)}

Would need to become:

9   {(1,0,0,0,0)}

...concatenating those tuples into one tuple, preserving the order, again without having advance knowledge of how many tuples will be in "bitmap".  I can't figure out how to do it.

Thanks in advance for any suggestions...
Steve
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB