|
|
Dexin Wang 2011-03-08, 20:29
Is there a way to use STORE with variable or some other way to achieve what I need.
I have something like this:
grunt> DESCRIBE A; A: {f1, f2, f3, ...}
grunt> DUMP A; (v1, x2, x3, ...) (v2, x4, x5, ...) (v1, x6, x6, ...) ...
I do so processing and then group by f1 and would like to save the result in different directories for different f1, like this:
/result/f1/result_for_v1 /result/f2/result_for_v2 /result/f2/result_for_v2 ...
I know I could use SPLIT, but I have 100+ unique values for f1, and number of uniques varies each time I process. It will be nice I don't have list 100 BY lines with SPLIT and I certainly do not want to maintain the list of possible values for f1 in my Pig script.
Thanks! Dexin
-
Re: STORE with variable?
Xiaomeng Wan 2011-03-08, 20:34
you can use the multistorage udf in piggybank.
Shawn
On Tue, Mar 8, 2011 at 1:29 PM, Dexin Wang <[EMAIL PROTECTED]> wrote: > Is there a way to use STORE with variable or some other way to achieve what > I need. > > I have something like this: > > grunt> DESCRIBE A; > A: {f1, f2, f3, ...} > > grunt> DUMP A; > (v1, x2, x3, ...) > (v2, x4, x5, ...) > (v1, x6, x6, ...) > ... > > I do so processing and then group by f1 and would like to save the result in > different directories for different f1, like this: > > /result/f1/result_for_v1 > /result/f2/result_for_v2 > /result/f2/result_for_v2 > ... > > I know I could use SPLIT, but I have 100+ unique values for f1, and number > of uniques varies each time I process. It will be nice I don't have list 100 > BY lines with SPLIT and I certainly do not want to maintain the list of > possible values for f1 in my Pig script. > > Thanks! > Dexin >
-
Re: STORE with variable?
Dexin Wang 2011-03-08, 21:22
awesome. Thanks Shawn.
On Tue, Mar 8, 2011 at 12:34 PM, Xiaomeng Wan <[EMAIL PROTECTED]> wrote:
> you can use the multistorage udf in piggybank. > > Shawn > > On Tue, Mar 8, 2011 at 1:29 PM, Dexin Wang <[EMAIL PROTECTED]> wrote: > > Is there a way to use STORE with variable or some other way to achieve > what > > I need. > > > > I have something like this: > > > > grunt> DESCRIBE A; > > A: {f1, f2, f3, ...} > > > > grunt> DUMP A; > > (v1, x2, x3, ...) > > (v2, x4, x5, ...) > > (v1, x6, x6, ...) > > ... > > > > I do so processing and then group by f1 and would like to save the result > in > > different directories for different f1, like this: > > > > /result/f1/result_for_v1 > > /result/f2/result_for_v2 > > /result/f2/result_for_v2 > > ... > > > > I know I could use SPLIT, but I have 100+ unique values for f1, and > number > > of uniques varies each time I process. It will be nice I don't have list > 100 > > BY lines with SPLIT and I certainly do not want to maintain the list of > > possible values for f1 in my Pig script. > > > > Thanks! > > Dexin > > >
-
Re: STORE with variable?
Dexin Wang 2011-03-08, 22:04
Unfortunately, it doesn't work. Seems the same problem as in https://issues.apache.org/jira/browse/PIG-1547On Tue, Mar 8, 2011 at 1:22 PM, Dexin Wang <[EMAIL PROTECTED]> wrote: > awesome. Thanks Shawn. > > > On Tue, Mar 8, 2011 at 12:34 PM, Xiaomeng Wan <[EMAIL PROTECTED]> wrote: > >> you can use the multistorage udf in piggybank. >> >> Shawn >> >> On Tue, Mar 8, 2011 at 1:29 PM, Dexin Wang <[EMAIL PROTECTED]> wrote: >> > Is there a way to use STORE with variable or some other way to achieve >> what >> > I need. >> > >> > I have something like this: >> > >> > grunt> DESCRIBE A; >> > A: {f1, f2, f3, ...} >> > >> > grunt> DUMP A; >> > (v1, x2, x3, ...) >> > (v2, x4, x5, ...) >> > (v1, x6, x6, ...) >> > ... >> > >> > I do so processing and then group by f1 and would like to save the >> result in >> > different directories for different f1, like this: >> > >> > /result/f1/result_for_v1 >> > /result/f2/result_for_v2 >> > /result/f2/result_for_v2 >> > ... >> > >> > I know I could use SPLIT, but I have 100+ unique values for f1, and >> number >> > of uniques varies each time I process. It will be nice I don't have list >> 100 >> > BY lines with SPLIT and I certainly do not want to maintain the list of >> > possible values for f1 in my Pig script. >> > >> > Thanks! >> > Dexin >> > >> > >
-
Re: STORE with variable?
Xiaomeng Wan 2011-03-09, 17:18
sorry to hear that. We used it in a old project. It works well with pig0.6.0. Shawn On Tue, Mar 8, 2011 at 3:04 PM, Dexin Wang <[EMAIL PROTECTED]> wrote: > Unfortunately, it doesn't work. > Seems the same problem as in https://issues.apache.org/jira/browse/PIG-1547> > On Tue, Mar 8, 2011 at 1:22 PM, Dexin Wang <[EMAIL PROTECTED]> wrote: >> >> awesome. Thanks Shawn. >> >> On Tue, Mar 8, 2011 at 12:34 PM, Xiaomeng Wan <[EMAIL PROTECTED]> wrote: >>> >>> you can use the multistorage udf in piggybank. >>> >>> Shawn >>> >>> On Tue, Mar 8, 2011 at 1:29 PM, Dexin Wang <[EMAIL PROTECTED]> wrote: >>> > Is there a way to use STORE with variable or some other way to achieve >>> > what >>> > I need. >>> > >>> > I have something like this: >>> > >>> > grunt> DESCRIBE A; >>> > A: {f1, f2, f3, ...} >>> > >>> > grunt> DUMP A; >>> > (v1, x2, x3, ...) >>> > (v2, x4, x5, ...) >>> > (v1, x6, x6, ...) >>> > ... >>> > >>> > I do so processing and then group by f1 and would like to save the >>> > result in >>> > different directories for different f1, like this: >>> > >>> > /result/f1/result_for_v1 >>> > /result/f2/result_for_v2 >>> > /result/f2/result_for_v2 >>> > ... >>> > >>> > I know I could use SPLIT, but I have 100+ unique values for f1, and >>> > number >>> > of uniques varies each time I process. It will be nice I don't have >>> > list 100 >>> > BY lines with SPLIT and I certainly do not want to maintain the list of >>> > possible values for f1 in my Pig script. >>> > >>> > Thanks! >>> > Dexin >>> > >> > >
-
Re: STORE with variable?
Daniel Dai 2011-03-10, 20:21
You may try custom partitioner. http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#partitionbyhttps://issues.apache.org/jira/browse/PIG-282. Daniel On 03/08/2011 02:04 PM, Dexin Wang wrote: > Unfortunately, it doesn't work. > > Seems the same problem as in https://issues.apache.org/jira/browse/PIG-1547> > On Tue, Mar 8, 2011 at 1:22 PM, Dexin Wang<[EMAIL PROTECTED]> wrote: > >> awesome. Thanks Shawn. >> >> >> On Tue, Mar 8, 2011 at 12:34 PM, Xiaomeng Wan<[EMAIL PROTECTED]> wrote: >> >>> you can use the multistorage udf in piggybank. >>> >>> Shawn >>> >>> On Tue, Mar 8, 2011 at 1:29 PM, Dexin Wang<[EMAIL PROTECTED]> wrote: >>>> Is there a way to use STORE with variable or some other way to achieve >>> what >>>> I need. >>>> >>>> I have something like this: >>>> >>>> grunt> DESCRIBE A; >>>> A: {f1, f2, f3, ...} >>>> >>>> grunt> DUMP A; >>>> (v1, x2, x3, ...) >>>> (v2, x4, x5, ...) >>>> (v1, x6, x6, ...) >>>> ... >>>> >>>> I do so processing and then group by f1 and would like to save the >>> result in >>>> different directories for different f1, like this: >>>> >>>> /result/f1/result_for_v1 >>>> /result/f2/result_for_v2 >>>> /result/f2/result_for_v2 >>>> ... >>>> >>>> I know I could use SPLIT, but I have 100+ unique values for f1, and >>> number >>>> of uniques varies each time I process. It will be nice I don't have list >>> 100 >>>> BY lines with SPLIT and I certainly do not want to maintain the list of >>>> possible values for f1 in my Pig script. >>>> >>>> Thanks! >>>> Dexin >>>> >>
|
|