Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> how to operate a map type


+
Jameson Li 2011-05-23, 14:06
+
Daniel Dai 2011-05-23, 18:55
+
Jameson Li 2011-05-24, 02:07
+
Daniel Dai 2011-05-24, 06:19
+
Jameson Li 2011-05-24, 10:05
+
Alan Gates 2011-05-24, 18:04
+
Jameson Li 2011-06-02, 13:28
+
Xiaomeng Wan 2011-06-02, 15:55
Copy link to this message
-
Re: how to operate a map type
Another alternative is to write a udf that returns all keys in a map as a bag.
I think this will be useful addition to piggybank. It will also be useful to have getEntries(Map), getValues(Map) udfs in piggybank.
If you choose this option and you are in a position to contribute the udf code, please do so.

Thanks,
Thejas

On 6/2/11 8:55 AM, "Xiaomeng Wan" <[EMAIL PROTECTED]> wrote:

can't you udf return a bag of tuple with two fields (ie key and
value), then flatten it?

Shawn

On Thu, Jun 2, 2011 at 7:28 AM, Jameson Li <[EMAIL PROTECTED]> wrote:
> Hi,
>
> my pig code is like this:
> register myudf.jar
> a = load 'testurls' as (info:chararray);
> b = foreach a generate info,com.company.pig.GetInfoScore($0) as m;
> dump b;
>
> The output is like this:
> (65RFPRO800863GPT,[108#0.2])
> (6JL6U6EA00863J0J,[352#0.5,25#0.15,108#0.07,26#0.06,4#0.16])
> (6B7FF3E300052E97,[25#0.28,405#0.05,4#0.05])
> (5498267_31,[108#0.05,25#0.19,12#0.19])
>
> And I want to group by the map key, and count the info, just like the below
> output:
> 108  3        /*65RFPRO800863GPT   6JL6U6EA00863J0J   5498267_31 */
> 352  1        /*6JL6U6EA00863J0J*/
> 25    3        /*6JL6U6EA00863J0J  6B7FF3E300052E97 5498267_31 */
> 26    1        /*6JL6U6EA00863J0J*/
> 4      2        /*6JL6U6EA00863J0J   6B7FF3E300052E97*/
> 405   1       /*6B7FF3E300052E97*/
> 12     1       /*5498267_31*/
>
> I have a think that I have to split the map to many rows just as the below:
> (65RFPRO800863GPT, 108, 0.2)
> (6JL6U6EA00863J0J, 352, 0.5)
> (6JL6U6EA00863J0J, 25, 0.15)
> (6JL6U6EA00863J0J, 108, 0.07)
> (6JL6U6EA00863J0J, 26, 0.06)
> (6JL6U6EA00863J0J, 4, 0.16)
> (6B7FF3E300052E97, 25, 0.28)
> (6JL6U6EA00863J0J, 405, 0.05)
> (6JL6U6EA00863J0J, 4, 0.05)
> (5498267_31, 108, 0.05)
> (6JL6U6EA00863J0J, 25, 0.19)
> (6JL6U6EA00863J0J, 12, 0.19)
>
> And then it is easy to group and count.
> Am I right?
> I have no idea how to split the map to many rows as the above show.
> Help.
>
> Thanks.
>
> 2011/5/25 Alan Gates <[EMAIL PROTECTED]>
>
>> Can't you mimic dynamic key support with static keys by making your map
>> have two static keys 'key' and 'value'?
>>
>> Alan.
>>
>>
>> On May 24, 2011, at 3:05 AM, Jameson Li wrote:
>>
>>  OK.OK.I know that just write UDFs.
>>> I have to write UDFs, and see you......
>>> And I still think there should be grammar support for map operation both
>>> static key and dynamic key.............
>>>
>>> Thanks.
>>>
>>> 2011/5/24 Daniel Dai <[EMAIL PROTECTED]>
>>>
>>>  GetKey(m) already get the key, so you can filter the key. For value, you
>>>> may need to put into UDF.
>>>>
>>>> Grammar support for map is based on static key, eg: m#'key1'. Your use
>>>> case
>>>> is mostly dealing dynamic keys, which you may rely on yourself currently.
>>>>
>>>> Daniel
>>>>
>>>> -----Original Message----- From: Jameson Li
>>>> Sent: Monday, May 23, 2011 7:07 PM
>>>> To: Daniel Dai
>>>> Cc: [EMAIL PROTECTED]
>>>> Subject: Re: how to operate a map type
>>>>
>>>>
>>>> And how to filter a map key or a map value? And also only UDF?
>>>>
>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>> c = filter b by m.key == 'aaa' or m.value> 0.2;
>>>>
>>>> How could I write the code?
>>>> Any other way without writing UDF?
>>>>
>>>> And I have a doubt since only writing UDF can operate a map type, why not
>>>> have the official functions about the map type?
>>>>
>>>> Thanks.
>>>>
>>>> 2011/5/24 Daniel Dai <[EMAIL PROTECTED]>
>>>>
>>>> I cannot think of a way without writing UDF. You can write two UDF:
>>>>
>>>>> * GetKey, input a map, output the key of the map
>>>>> * GetValues, input a bag of map, output a bag of map values
>>>>>
>>>>> The script is like:
>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>>> c = foreach b generate GetKey(m) as key, m;
>>>>> d = group c by key;
>>>>> e = foreach c generate group, SUM(GetValues(c.m));
>>>>>
>>>>>
>>>>> Daniel
>>>>>
>>>>>
>>>>> On 05/23/2011 07:06 AM, Jameson Li wrote:

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB