|
|
-
Re: how can I store multiple result once a time?Haitao Yao 2012-03-02, 03:47
Yeah , union can do this. But the real purpose for me is to reduce the map reduce job count . Although I union 2 result sets into one, It still submit 2 map reduce jobs and read the data twice. here's my script: register '/home/hadoop/pig/matrix-pig.jar'; RawData = load '/data/' using PigStorage(',') as (gid:long, payload:bytearray, ts:long, type:int); RawData = filter RawData by type == 1000 and ts >= 20120302090000L and ts <= 20120302100000L; FormattedData = foreach RawData { payload = he.HEDataConverter(payload); generate gid, ts, type, payload#'_event_id' as p__event_id, payload#'object' as p_object; } FilteredData = filter FormattedData by (int) p__event_id == 217; ResultSet = group FilteredData by p_object; Result = foreach ResultSet{ Value = FilteredData.gid; Value = distinct Value; generate '217', CONCAT(CONCAT('object', ':'), group), he.HECOUNT(Value); } FormattedData = foreach RawData { payload = he.HEDataConverter(payload); generate gid, ts, type, payload#'_event_id' as p__event_id, payload#'result' as p_result; } FilteredData = filter FormattedData by (int) p__event_id == 217; ResultSet = group FilteredData by p_result; Result1 = foreach ResultSet{ Value = FilteredData.gid; Value = distinct Value; generate '217', CONCAT(CONCAT('result', ':'), group), he.HECOUNT(Value); } A = union Result, Resut1; store A; How can I use 1 map reduce job to do the work? I do not want to read the data twice. It will cause heavy load on the hdfs. thanks! 姓名(Name): 姚海涛(Haitao Yao) 邮箱(email): [EMAIL PROTECTED] 新浪微博(weibo): @haitao_yao 在 2012-3-2,上午11:07, Prashant Kommireddi 写� 溃� > Can you merge Result1 and Result2 using "UNION" before STORE? > http://pig.apache.org/docs/r0.9.1/basic.html#union > > 2012/3/1 Haitao Yao <[EMAIL PROTECTED]> > >> Hi , all >> How can I store multiple result using one store function? >> for example: store Result1, Result 2 into '/tmp/result' using >> PigStorage(','); >> >> the default store function does not accept multiple parameter as >> input . >> >> thanks >> >> >> >> >> 姓名(Name): 姚海涛(Haitao Yao) >> 邮箱(email): [EMAIL PROTECTED] >> 新浪微博(weibo): @haitao_yao >> >> |