Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - how can I store multiple result once a time?


Copy link to this message
-
Re: how can I store multiple result once a time?
Haitao Yao 2012-03-02, 03:47

Yeah ,  union can do this.

But the real purpose for me is to reduce the map reduce job count .

Although I union 2 result sets into one, It still submit 2 map reduce jobs and read the data twice. here's my script:
register '/home/hadoop/pig/matrix-pig.jar';
RawData = load '/data/' using PigStorage(',') as (gid:long, payload:bytearray, ts:long, type:int);
RawData = filter RawData by type == 1000 and ts >= 20120302090000L and ts <= 20120302100000L;
FormattedData = foreach RawData {
    payload = he.HEDataConverter(payload);
    generate gid, ts, type, payload#'_event_id' as p__event_id, payload#'object' as p_object;
}
FilteredData = filter FormattedData by (int) p__event_id == 217;
ResultSet  = group FilteredData by p_object;
Result = foreach ResultSet{
    Value = FilteredData.gid;
    Value = distinct Value;
    generate '217', CONCAT(CONCAT('object', ':'), group), he.HECOUNT(Value);
}
FormattedData = foreach RawData {
    payload = he.HEDataConverter(payload);
    generate gid, ts, type, payload#'_event_id' as p__event_id, payload#'result' as p_result;
}
FilteredData = filter FormattedData by (int) p__event_id == 217;
ResultSet  = group FilteredData by p_result;

Result1 = foreach ResultSet{
    Value = FilteredData.gid;
    Value = distinct Value;
    generate '217', CONCAT(CONCAT('result', ':'), group), he.HECOUNT(Value);
}
A = union Result, Resut1;
store A;
How can I use 1 map reduce job to do  the work?   I do not want to read the data twice. It will cause heavy load on the hdfs.

thanks!

姓名(Name): 姚海涛(Haitao Yao)
邮箱(email): [EMAIL PROTECTED]
新浪微博(weibo): @haitao_yao

在 2012-3-2,上午11:07, Prashant Kommireddi 写�
溃�
> Can you merge Result1 and Result2 using "UNION" before STORE?
> http://pig.apache.org/docs/r0.9.1/basic.html#union
>
> 2012/3/1 Haitao Yao <[EMAIL PROTECTED]>
>
>> Hi , all
>>       How can I store multiple result using one store function?
>>       for example: store Result1, Result 2 into '/tmp/result' using
>> PigStorage(',');
>>
>>       the default store function does not accept multiple parameter as
>> input .
>>
>>       thanks
>>
>>
>>
>>
>> 姓名(Name):       姚海涛(Haitao Yao)
>> 邮箱(email):              [EMAIL PROTECTED]
>> 新浪微博(weibo):    @haitao_yao
>>
>>