Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> how to operate a map type


Copy link to this message
-
Re: how to operate a map type
Hi,

my pig code is like this:
register myudf.jar
a = load 'testurls' as (info:chararray);
b = foreach a generate info,com.company.pig.GetInfoScore($0) as m;
dump b;

The output is like this:
(65RFPRO800863GPT,[108#0.2])
(6JL6U6EA00863J0J,[352#0.5,25#0.15,108#0.07,26#0.06,4#0.16])
(6B7FF3E300052E97,[25#0.28,405#0.05,4#0.05])
(5498267_31,[108#0.05,25#0.19,12#0.19])

And I want to group by the map key, and count the info, just like the below
output:
108  3        /*65RFPRO800863GPT   6JL6U6EA00863J0J   5498267_31 */
352  1        /*6JL6U6EA00863J0J*/
25    3        /*6JL6U6EA00863J0J  6B7FF3E300052E97 5498267_31 */
26    1        /*6JL6U6EA00863J0J*/
4      2        /*6JL6U6EA00863J0J   6B7FF3E300052E97*/
405   1       /*6B7FF3E300052E97*/
12     1       /*5498267_31*/

I have a think that I have to split the map to many rows just as the below:
(65RFPRO800863GPT, 108, 0.2)
(6JL6U6EA00863J0J, 352, 0.5)
(6JL6U6EA00863J0J, 25, 0.15)
(6JL6U6EA00863J0J, 108, 0.07)
(6JL6U6EA00863J0J, 26, 0.06)
(6JL6U6EA00863J0J, 4, 0.16)
(6B7FF3E300052E97, 25, 0.28)
(6JL6U6EA00863J0J, 405, 0.05)
(6JL6U6EA00863J0J, 4, 0.05)
(5498267_31, 108, 0.05)
(6JL6U6EA00863J0J, 25, 0.19)
(6JL6U6EA00863J0J, 12, 0.19)

And then it is easy to group and count.
Am I right?
I have no idea how to split the map to many rows as the above show.
Help.

Thanks.

2011/5/25 Alan Gates <[EMAIL PROTECTED]>

> Can't you mimic dynamic key support with static keys by making your map
> have two static keys 'key' and 'value'?
>
> Alan.
>
>
> On May 24, 2011, at 3:05 AM, Jameson Li wrote:
>
>  OK.OK.I know that just write UDFs.
>> I have to write UDFs, and see you......
>> And I still think there should be grammar support for map operation both
>> static key and dynamic key.............
>>
>> Thanks.
>>
>> 2011/5/24 Daniel Dai <[EMAIL PROTECTED]>
>>
>>  GetKey(m) already get the key, so you can filter the key. For value, you
>>> may need to put into UDF.
>>>
>>> Grammar support for map is based on static key, eg: m#'key1'. Your use
>>> case
>>> is mostly dealing dynamic keys, which you may rely on yourself currently.
>>>
>>> Daniel
>>>
>>> -----Original Message----- From: Jameson Li
>>> Sent: Monday, May 23, 2011 7:07 PM
>>> To: Daniel Dai
>>> Cc: [EMAIL PROTECTED]
>>> Subject: Re: how to operate a map type
>>>
>>>
>>> And how to filter a map key or a map value? And also only UDF?
>>>
>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>> c = filter b by m.key == 'aaa' or m.value> 0.2;
>>>
>>> How could I write the code?
>>> Any other way without writing UDF?
>>>
>>> And I have a doubt since only writing UDF can operate a map type, why not
>>> have the official functions about the map type?
>>>
>>> Thanks.
>>>
>>> 2011/5/24 Daniel Dai <[EMAIL PROTECTED]>
>>>
>>> I cannot think of a way without writing UDF. You can write two UDF:
>>>
>>>> * GetKey, input a map, output the key of the map
>>>> * GetValues, input a bag of map, output a bag of map values
>>>>
>>>> The script is like:
>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>> c = foreach b generate GetKey(m) as key, m;
>>>> d = group c by key;
>>>> e = foreach c generate group, SUM(GetValues(c.m));
>>>>
>>>>
>>>> Daniel
>>>>
>>>>
>>>> On 05/23/2011 07:06 AM, Jameson Li wrote:
>>>>
>>>> Hi all,
>>>>
>>>>>
>>>>> I have the below pig code:
>>>>>
>>>>> register /home/uu/project/lib/pigudfs.jar
>>>>> ruls = load 'testurl' as (url:chararray);
>>>>>
>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1);
>>>>>
>>>>> here when dump b, it will return:
>>>>> ([4#0.1677963])
>>>>> ([193#0.16985779,81#0.10994483])
>>>>> ([418#0.14138427,9#0.1107544,282#0.18699136])
>>>>>
>>>>> I just want group by the map key and sum the map value just like:
>>>>> c = group b by $0#key;
>>>>> d = foreach c generate group,SUM(b.$0#value);
>>>>>
>>>>> How could I write the code?
>>>>>
>>>>> Thanks,
>>>>> Jameson Li.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>