Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Accessing nested bags & tuples

Benjamin Juhn 2012-07-08, 17:57
Alex Rovner 2012-07-09, 15:14
Copy link to this message
Re: Accessing nested bags & tuples
Alex: your solution won't work because the field 'type' isn't accessible
within tags_bag.

Ben: I think you're saying you want to concat the 'tags' value in the *first
* tags_tuple in tags_bag?
Just to clarify, bags are unordered, so asking for the 'first' element in a
bag doesn't make sense unless you do a LIMIT immediately after an ORDER BY,
or use the TOP function. Either way, you need to order your bag first to
impose the idea of 'first element' on to it.

You could do something like this:

proc = foreach records { ordered_bag = ORDER tags_bag BY tags_tuple;
first_tuple_bag = LIMIT ordered_bag 1; generate FLATTEN(first_tuple_bag) as
tag_first, type, tags_bag; }
out = foreach proc generate CONCAT(tag_first.$0, type) as tag_first_type,
type, tags_bag;

This will produce an extra field containing the CONCAT-ed value of type and
the first tags_bag tuple, but the bag tags_bag remains unaltered. But if
you want to actually modify tags_bag to have its 'first' (by whatever
ordering) tuple contain the CONCAT-ed value, that'd be more tricky. I don't
know why you'd want such a thing, but a UDF would accomplish that quickest
probably. If Pig supported a nested UNION, I'd have a way of doing this in
'native' Pig, but as there isn't, I can't think of a non-UDF approach
straight away.

On 8 July 2012 23:27, Benjamin Juhn <[EMAIL PROTECTED]> wrote:

> Hi There,
> I'm trying to concat the first tag string with type string for all
> records.  Could someone advise on syntax?
> records: {meta:(type: chararray, tags_bag: {t: (tags_tuple: (tags:
> chararray))}
> Thanks,
> Ben