Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - how to operate a map type


+
Jameson Li 2011-05-23, 14:06
+
Daniel Dai 2011-05-23, 18:55
+
Jameson Li 2011-05-24, 02:07
+
Daniel Dai 2011-05-24, 06:19
+
Jameson Li 2011-05-24, 10:05
+
Alan Gates 2011-05-24, 18:04
+
Jameson Li 2011-06-02, 13:28
+
Xiaomeng Wan 2011-06-02, 15:55
Copy link to this message
-
Re: how to operate a map type
Thejas M Nair 2011-06-02, 18:58
Another alternative is to write a udf that returns all keys in a map as a bag.
I think this will be useful addition to piggybank. It will also be useful to have getEntries(Map), getValues(Map) udfs in piggybank.
If you choose this option and you are in a position to contribute the udf code, please do so.

Thanks,
Thejas

On 6/2/11 8:55 AM, "Xiaomeng Wan" <[EMAIL PROTECTED]> wrote:

can't you udf return a bag of tuple with two fields (ie key and
value), then flatten it?

Shawn

On Thu, Jun 2, 2011 at 7:28 AM, Jameson Li <[EMAIL PROTECTED]> wrote:
> Hi,
>
> my pig code is like this:
> register myudf.jar
> a = load 'testurls' as (info:chararray);
> b = foreach a generate info,com.company.pig.GetInfoScore($0) as m;
> dump b;
>
> The output is like this:
> (65RFPRO800863GPT,[108#0.2])
> (6JL6U6EA00863J0J,[352#0.5,25#0.15,108#0.07,26#0.06,4#0.16])
> (6B7FF3E300052E97,[25#0.28,405#0.05,4#0.05])
> (5498267_31,[108#0.05,25#0.19,12#0.19])
>
> And I want to group by the map key, and count the info, just like the below
> output:
> 108  3        /*65RFPRO800863GPT   6JL6U6EA00863J0J   5498267_31 */
> 352  1        /*6JL6U6EA00863J0J*/
> 25    3        /*6JL6U6EA00863J0J  6B7FF3E300052E97 5498267_31 */
> 26    1        /*6JL6U6EA00863J0J*/
> 4      2        /*6JL6U6EA00863J0J   6B7FF3E300052E97*/
> 405   1       /*6B7FF3E300052E97*/
> 12     1       /*5498267_31*/
>
> I have a think that I have to split the map to many rows just as the below:
> (65RFPRO800863GPT, 108, 0.2)
> (6JL6U6EA00863J0J, 352, 0.5)
> (6JL6U6EA00863J0J, 25, 0.15)
> (6JL6U6EA00863J0J, 108, 0.07)
> (6JL6U6EA00863J0J, 26, 0.06)
> (6JL6U6EA00863J0J, 4, 0.16)
> (6B7FF3E300052E97, 25, 0.28)
> (6JL6U6EA00863J0J, 405, 0.05)
> (6JL6U6EA00863J0J, 4, 0.05)
> (5498267_31, 108, 0.05)
> (6JL6U6EA00863J0J, 25, 0.19)
> (6JL6U6EA00863J0J, 12, 0.19)
>
> And then it is easy to group and count.
> Am I right?
> I have no idea how to split the map to many rows as the above show.
> Help.
>
> Thanks.
>
> 2011/5/25 Alan Gates <[EMAIL PROTECTED]>
>
>> Can't you mimic dynamic key support with static keys by making your map
>> have two static keys 'key' and 'value'?
>>
>> Alan.
>>
>>
>> On May 24, 2011, at 3:05 AM, Jameson Li wrote:
>>
>>  OK.OK.I know that just write UDFs.
>>> I have to write UDFs, and see you......
>>> And I still think there should be grammar support for map operation both
>>> static key and dynamic key.............
>>>
>>> Thanks.
>>>
>>> 2011/5/24 Daniel Dai <[EMAIL PROTECTED]>
>>>
>>>  GetKey(m) already get the key, so you can filter the key. For value, you
>>>> may need to put into UDF.
>>>>
>>>> Grammar support for map is based on static key, eg: m#'key1'. Your use
>>>> case
>>>> is mostly dealing dynamic keys, which you may rely on yourself currently.
>>>>
>>>> Daniel
>>>>
>>>> -----Original Message----- From: Jameson Li
>>>> Sent: Monday, May 23, 2011 7:07 PM
>>>> To: Daniel Dai
>>>> Cc: [EMAIL PROTECTED]
>>>> Subject: Re: how to operate a map type
>>>>
>>>>
>>>> And how to filter a map key or a map value? And also only UDF?
>>>>
>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>> c = filter b by m.key == 'aaa' or m.value> 0.2;
>>>>
>>>> How could I write the code?
>>>> Any other way without writing UDF?
>>>>
>>>> And I have a doubt since only writing UDF can operate a map type, why not
>>>> have the official functions about the map type?
>>>>
>>>> Thanks.
>>>>
>>>> 2011/5/24 Daniel Dai <[EMAIL PROTECTED]>
>>>>
>>>> I cannot think of a way without writing UDF. You can write two UDF:
>>>>
>>>>> * GetKey, input a map, output the key of the map
>>>>> * GetValues, input a bag of map, output a bag of map values
>>>>>
>>>>> The script is like:
>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>>> c = foreach b generate GetKey(m) as key, m;
>>>>> d = group c by key;
>>>>> e = foreach c generate group, SUM(GetValues(c.m));
>>>>>
>>>>>
>>>>> Daniel
>>>>>
>>>>>
>>>>> On 05/23/2011 07:06 AM, Jameson Li wrote: