Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> how can I store multiple result once a time?


Copy link to this message
-
Re: how can I store multiple result once a time?

Yeah ,  union can do this.

But the real purpose for me is to reduce the map reduce job count .

Although I union 2 result sets into one, It still submit 2 map reduce jobs and read the data twice. here's my script:
register '/home/hadoop/pig/matrix-pig.jar';
RawData = load '/data/' using PigStorage(',') as (gid:long, payload:bytearray, ts:long, type:int);
RawData = filter RawData by type == 1000 and ts >= 20120302090000L and ts <= 20120302100000L;
FormattedData = foreach RawData {
    payload = he.HEDataConverter(payload);
    generate gid, ts, type, payload#'_event_id' as p__event_id, payload#'object' as p_object;
}
FilteredData = filter FormattedData by (int) p__event_id == 217;
ResultSet  = group FilteredData by p_object;
Result = foreach ResultSet{
    Value = FilteredData.gid;
    Value = distinct Value;
    generate '217', CONCAT(CONCAT('object', ':'), group), he.HECOUNT(Value);
}
FormattedData = foreach RawData {
    payload = he.HEDataConverter(payload);
    generate gid, ts, type, payload#'_event_id' as p__event_id, payload#'result' as p_result;
}
FilteredData = filter FormattedData by (int) p__event_id == 217;
ResultSet  = group FilteredData by p_result;

Result1 = foreach ResultSet{
    Value = FilteredData.gid;
    Value = distinct Value;
    generate '217', CONCAT(CONCAT('result', ':'), group), he.HECOUNT(Value);
}
A = union Result, Resut1;
store A;
How can I use 1 map reduce job to do  the work?   I do not want to read the data twice. It will cause heavy load on the hdfs.

thanks!

姓名(Name): 姚海涛(Haitao Yao)
邮箱(email): [EMAIL PROTECTED]
新浪微博(weibo): @haitao_yao

在 2012-3-2,上午11:07, Prashant Kommireddi 写�
溃�
> Can you merge Result1 and Result2 using "UNION" before STORE?
> http://pig.apache.org/docs/r0.9.1/basic.html#union
>
> 2012/3/1 Haitao Yao <[EMAIL PROTECTED]>
>
>> Hi , all
>>       How can I store multiple result using one store function?
>>       for example: store Result1, Result 2 into '/tmp/result' using
>> PigStorage(',');
>>
>>       the default store function does not accept multiple parameter as
>> input .
>>
>>       thanks
>>
>>
>>
>>
>> 姓名(Name):       姚海涛(Haitao Yao)
>> 邮箱(email):              [EMAIL PROTECTED]
>> 新浪微博(weibo):    @haitao_yao
>>
>>

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB