Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> how to operate a map type


Copy link to this message
-
Re: how to operate a map type
Hi,

my pig code is like this:
register myudf.jar
a = load 'testurls' as (info:chararray);
b = foreach a generate info,com.company.pig.GetInfoScore($0) as m;
dump b;

The output is like this:
(65RFPRO800863GPT,[108#0.2])
(6JL6U6EA00863J0J,[352#0.5,25#0.15,108#0.07,26#0.06,4#0.16])
(6B7FF3E300052E97,[25#0.28,405#0.05,4#0.05])
(5498267_31,[108#0.05,25#0.19,12#0.19])

And I want to group by the map key, and count the info, just like the below
output:
108  3        /*65RFPRO800863GPT   6JL6U6EA00863J0J   5498267_31 */
352  1        /*6JL6U6EA00863J0J*/
25    3        /*6JL6U6EA00863J0J  6B7FF3E300052E97 5498267_31 */
26    1        /*6JL6U6EA00863J0J*/
4      2        /*6JL6U6EA00863J0J   6B7FF3E300052E97*/
405   1       /*6B7FF3E300052E97*/
12     1       /*5498267_31*/

I have a think that I have to split the map to many rows just as the below:
(65RFPRO800863GPT, 108, 0.2)
(6JL6U6EA00863J0J, 352, 0.5)
(6JL6U6EA00863J0J, 25, 0.15)
(6JL6U6EA00863J0J, 108, 0.07)
(6JL6U6EA00863J0J, 26, 0.06)
(6JL6U6EA00863J0J, 4, 0.16)
(6B7FF3E300052E97, 25, 0.28)
(6JL6U6EA00863J0J, 405, 0.05)
(6JL6U6EA00863J0J, 4, 0.05)
(5498267_31, 108, 0.05)
(6JL6U6EA00863J0J, 25, 0.19)
(6JL6U6EA00863J0J, 12, 0.19)

And then it is easy to group and count.
Am I right?
I have no idea how to split the map to many rows as the above show.
Help.

Thanks.

2011/5/25 Alan Gates <[EMAIL PROTECTED]>

> Can't you mimic dynamic key support with static keys by making your map
> have two static keys 'key' and 'value'?
>
> Alan.
>
>
> On May 24, 2011, at 3:05 AM, Jameson Li wrote:
>
>  OK.OK.I know that just write UDFs.
>> I have to write UDFs, and see you......
>> And I still think there should be grammar support for map operation both
>> static key and dynamic key.............
>>
>> Thanks.
>>
>> 2011/5/24 Daniel Dai <[EMAIL PROTECTED]>
>>
>>  GetKey(m) already get the key, so you can filter the key. For value, you
>>> may need to put into UDF.
>>>
>>> Grammar support for map is based on static key, eg: m#'key1'. Your use
>>> case
>>> is mostly dealing dynamic keys, which you may rely on yourself currently.
>>>
>>> Daniel
>>>
>>> -----Original Message----- From: Jameson Li
>>> Sent: Monday, May 23, 2011 7:07 PM
>>> To: Daniel Dai
>>> Cc: [EMAIL PROTECTED]
>>> Subject: Re: how to operate a map type
>>>
>>>
>>> And how to filter a map key or a map value? And also only UDF?
>>>
>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>> c = filter b by m.key == 'aaa' or m.value> 0.2;
>>>
>>> How could I write the code?
>>> Any other way without writing UDF?
>>>
>>> And I have a doubt since only writing UDF can operate a map type, why not
>>> have the official functions about the map type?
>>>
>>> Thanks.
>>>
>>> 2011/5/24 Daniel Dai <[EMAIL PROTECTED]>
>>>
>>> I cannot think of a way without writing UDF. You can write two UDF:
>>>
>>>> * GetKey, input a map, output the key of the map
>>>> * GetValues, input a bag of map, output a bag of map values
>>>>
>>>> The script is like:
>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>> c = foreach b generate GetKey(m) as key, m;
>>>> d = group c by key;
>>>> e = foreach c generate group, SUM(GetValues(c.m));
>>>>
>>>>
>>>> Daniel
>>>>
>>>>
>>>> On 05/23/2011 07:06 AM, Jameson Li wrote:
>>>>
>>>> Hi all,
>>>>
>>>>>
>>>>> I have the below pig code:
>>>>>
>>>>> register /home/uu/project/lib/pigudfs.jar
>>>>> ruls = load 'testurl' as (url:chararray);
>>>>>
>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1);
>>>>>
>>>>> here when dump b, it will return:
>>>>> ([4#0.1677963])
>>>>> ([193#0.16985779,81#0.10994483])
>>>>> ([418#0.14138427,9#0.1107544,282#0.18699136])
>>>>>
>>>>> I just want group by the map key and sum the map value just like:
>>>>> c = group b by $0#key;
>>>>> d = foreach c generate group,SUM(b.$0#value);
>>>>>
>>>>> How could I write the code?
>>>>>
>>>>> Thanks,
>>>>> Jameson Li.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB