|
Jonathan Coveney
2010-12-03, 21:03
Daniel Dai
2010-12-06, 18:55
Anze
2010-12-06, 20:09
Jonathan Coveney
2010-12-06, 21:05
Alan Gates
2010-12-06, 21:13
Anze
2010-12-07, 08:44
Jonathan Coveney
2010-12-07, 15:10
Anze
2010-12-07, 17:16
Dmitriy Ryaboy
2010-12-07, 17:49
Santhosh Srinivasan
2010-12-07, 18:11
Dmitriy Ryaboy
2010-12-08, 00:50
Anze
2010-12-08, 08:24
Santhosh Srinivasan
2010-12-08, 20:02
|
-
Easy question...difference between this::form and this.form?Jonathan Coveney 2010-12-03, 21:03
It's very hard to search for this among the docs because it's so generic, so
I thought I'd ask... I'm sure the answer is painfully easy. Taking a look at this code that I found online, for example -- -- Read in a bag of tuples (timeseries for this example) and divide the -- numeric column by its maximum. -- %default DATABAG 'data/timeseries.tsv' data = LOAD '$DATABAG' AS (month:chararray, count:int); accumulate = GROUP data ALL; calc_max = FOREACH accumulate GENERATE FLATTEN(data), MAX(data.count) AS max_count; normalize = FOREACH calc_max GENERATE data::month AS month, data::count AS count, (float)data::count / (float)max_count AS normed_count; DUMP normalize; What purpose does data::month serve versus data.count? Thanks
-
Re: Easy question...difference between this::form and this.form?Daniel Dai 2010-12-06, 18:55
After join, cross, foreach flatten, Pig will automatically add
"base_alias::" prefix. All other cases use "." Daniel Jonathan Coveney wrote: > It's very hard to search for this among the docs because it's so generic, so > I thought I'd ask... I'm sure the answer is painfully easy. > > Taking a look at this code that I found online, for example > > -- > -- Read in a bag of tuples (timeseries for this example) and divide the > -- numeric column by its maximum. > -- > %default DATABAG 'data/timeseries.tsv' > > data = LOAD '$DATABAG' AS (month:chararray, count:int); > accumulate = GROUP data ALL; > calc_max = FOREACH accumulate GENERATE FLATTEN(data), > MAX(data.count) AS max_count; > normalize = FOREACH calc_max GENERATE data::month AS month, > data::count AS count, (float)data::count / (float)max_count AS > normed_count; > DUMP normalize; > > What purpose does data::month serve versus data.count? > > Thanks >
-
Re: Easy question...difference between this::form and this.form?Anze 2010-12-06, 20:09
Sorry to hijack your question, Jonathan, but while we are at it... :) Is there a way to tell Pig NOT to add "base_alias::"? Almost half my code consists of FOREACH... GENERATE that just remove these prefixes. Thanks, Anze On Monday 06 December 2010, Daniel Dai wrote: > After join, cross, foreach flatten, Pig will automatically add > "base_alias::" prefix. All other cases use "." > > Daniel > > Jonathan Coveney wrote: > > It's very hard to search for this among the docs because it's so generic, > > so I thought I'd ask... I'm sure the answer is painfully easy. > > > > Taking a look at this code that I found online, for example > > > > -- > > -- Read in a bag of tuples (timeseries for this example) and divide the > > -- numeric column by its maximum. > > -- > > %default DATABAG 'data/timeseries.tsv' > > > > data = LOAD '$DATABAG' AS (month:chararray, count:int); > > accumulate = GROUP data ALL; > > calc_max = FOREACH accumulate GENERATE FLATTEN(data), > > MAX(data.count) AS max_count; > > normalize = FOREACH calc_max GENERATE data::month AS month, > > data::count AS count, (float)data::count / (float)max_count AS > > normed_count; > > DUMP normalize; > > > > What purpose does data::month serve versus data.count? > > > > Thanks
-
Re: Easy question...difference between this::form and this.form?Jonathan Coveney 2010-12-06, 21:05
Hijack away. I would be curious as to the reason we need this as well.
2010/12/6 Anze <[EMAIL PROTECTED]> > > Sorry to hijack your question, Jonathan, but while we are at it... :) > > Is there a way to tell Pig NOT to add "base_alias::"? Almost half my code > consists of FOREACH... GENERATE that just remove these prefixes. > > Thanks, > > Anze > > On Monday 06 December 2010, Daniel Dai wrote: > > After join, cross, foreach flatten, Pig will automatically add > > "base_alias::" prefix. All other cases use "." > > > > Daniel > > > > Jonathan Coveney wrote: > > > It's very hard to search for this among the docs because it's so > generic, > > > so I thought I'd ask... I'm sure the answer is painfully easy. > > > > > > Taking a look at this code that I found online, for example > > > > > > -- > > > -- Read in a bag of tuples (timeseries for this example) and divide the > > > -- numeric column by its maximum. > > > -- > > > %default DATABAG 'data/timeseries.tsv' > > > > > > data = LOAD '$DATABAG' AS (month:chararray, count:int); > > > accumulate = GROUP data ALL; > > > calc_max = FOREACH accumulate GENERATE FLATTEN(data), > > > MAX(data.count) AS max_count; > > > normalize = FOREACH calc_max GENERATE data::month AS month, > > > data::count AS count, (float)data::count / (float)max_count AS > > > normed_count; > > > DUMP normalize; > > > > > > What purpose does data::month serve versus data.count? > > > > > > Thanks > >
-
Re: Easy question...difference between this::form and this.form?Alan Gates 2010-12-06, 21:13
The reason it's needed is that ambiguities would result otherwise.
A = load 'foo' as (x, y, z); B = load 'bar' as (w, x, y, z); C = join A by x, B by x; D = filter C by z > 0; -- which z? As long as the name is not ambiguous, the :: is not required. So in the above example it would be perfectly legal to say D = filter C by w > 0; Out of curiosity, why do you want to remove the :: names? Alan. On Dec 6, 2010, at 1:05 PM, Jonathan Coveney wrote: > Hijack away. I would be curious as to the reason we need this as well. > > 2010/12/6 Anze <[EMAIL PROTECTED]> > >> >> Sorry to hijack your question, Jonathan, but while we are at it... :) >> >> Is there a way to tell Pig NOT to add "base_alias::"? Almost half >> my code >> consists of FOREACH... GENERATE that just remove these prefixes. >> >> Thanks, >> >> Anze >> >> On Monday 06 December 2010, Daniel Dai wrote: >>> After join, cross, foreach flatten, Pig will automatically add >>> "base_alias::" prefix. All other cases use "." >>> >>> Daniel >>> >>> Jonathan Coveney wrote: >>>> It's very hard to search for this among the docs because it's so >> generic, >>>> so I thought I'd ask... I'm sure the answer is painfully easy. >>>> >>>> Taking a look at this code that I found online, for example >>>> >>>> -- >>>> -- Read in a bag of tuples (timeseries for this example) and >>>> divide the >>>> -- numeric column by its maximum. >>>> -- >>>> %default DATABAG 'data/timeseries.tsv' >>>> >>>> data = LOAD '$DATABAG' AS (month:chararray, count:int); >>>> accumulate = GROUP data ALL; >>>> calc_max = FOREACH accumulate GENERATE FLATTEN(data), >>>> MAX(data.count) AS max_count; >>>> normalize = FOREACH calc_max GENERATE data::month AS month, >>>> data::count AS count, (float)data::count / (float)max_count AS >>>> normed_count; >>>> DUMP normalize; >>>> >>>> What purpose does data::month serve versus data.count? >>>> >>>> Thanks >> >>
-
Re: Easy question...difference between this::form and this.form?Anze 2010-12-07, 08:44
I understand the reason for this, it just seems like a drastic solution. :) Ideally, Pig should be clever enough to detect ambiguity and deal with it, and leave the non-conflicting names intact. For instance: A = load 'foo' as (x, y, z); B = load 'bar' as (x, a, b, c); C = join A by x, B by x; DESCRIBE C; C: {A::x, y, z, B::x, a, b, c} or even: C: {x, y, z, B::x, a, b, c} or even a step further, in case of JOIN: C: {x, y, z, a, b, c} (since join *joins* by x, why would there be two? This doesn't always work for other operations, of course) Reasoning: at least in my cases the names are descriptive from the start, therefore there are almost no name conflicts. In rare cases where there are Pig can determine that and use old syntax with "::", then let me deal with it. I know this is backwards-incompatible change and is not likely to be accepted, but still... :) Anze On Monday 06 December 2010, Alan Gates wrote: > The reason it's needed is that ambiguities would result otherwise. > > A = load 'foo' as (x, y, z); > B = load 'bar' as (w, x, y, z); > C = join A by x, B by x; > D = filter C by z > 0; -- which z? > > As long as the name is not ambiguous, the :: is not required. So in > the above example it would be perfectly legal to say > > D = filter C by w > 0; > > Out of curiosity, why do you want to remove the :: names? > > Alan. > > On Dec 6, 2010, at 1:05 PM, Jonathan Coveney wrote: > > Hijack away. I would be curious as to the reason we need this as well. > > > > 2010/12/6 Anze <[EMAIL PROTECTED]> > > > >> Sorry to hijack your question, Jonathan, but while we are at it... :) > >> > >> Is there a way to tell Pig NOT to add "base_alias::"? Almost half > >> my code > >> consists of FOREACH... GENERATE that just remove these prefixes. > >> > >> Thanks, > >> > >> Anze > >> > >> On Monday 06 December 2010, Daniel Dai wrote: > >>> After join, cross, foreach flatten, Pig will automatically add > >>> "base_alias::" prefix. All other cases use "." > >>> > >>> Daniel > >>> > >>> Jonathan Coveney wrote: > >>>> It's very hard to search for this among the docs because it's so > >> > >> generic, > >> > >>>> so I thought I'd ask... I'm sure the answer is painfully easy. > >>>> > >>>> Taking a look at this code that I found online, for example > >>>> > >>>> -- > >>>> -- Read in a bag of tuples (timeseries for this example) and > >>>> divide the > >>>> -- numeric column by its maximum. > >>>> -- > >>>> %default DATABAG 'data/timeseries.tsv' > >>>> > >>>> data = LOAD '$DATABAG' AS (month:chararray, count:int); > >>>> accumulate = GROUP data ALL; > >>>> calc_max = FOREACH accumulate GENERATE FLATTEN(data), > >>>> MAX(data.count) AS max_count; > >>>> normalize = FOREACH calc_max GENERATE data::month AS month, > >>>> data::count AS count, (float)data::count / (float)max_count AS > >>>> normed_count; > >>>> DUMP normalize; > >>>> > >>>> What purpose does data::month serve versus data.count? > >>>> > >>>> Thanks
-
Re: Easy question...difference between this::form and this.form?Jonathan Coveney 2010-12-07, 15:10
Would that even be much better? It seems like it'd be better to have it be
consistent in appending the whatever::, so that at least you have to be cognizant of it when you do the join. If it starts being too clever, then it's up to you to figure out when it does and doesn't do it which might be annoying. 2010/12/7 Anze <[EMAIL PROTECTED]> > > I understand the reason for this, it just seems like a drastic solution. :) > > Ideally, Pig should be clever enough to detect ambiguity and deal with it, > and > leave the non-conflicting names intact. For instance: > > A = load 'foo' as (x, y, z); > B = load 'bar' as (x, a, b, c); > C = join A by x, B by x; > DESCRIBE C; > C: {A::x, y, z, B::x, a, b, c} > > or even: > C: {x, y, z, B::x, a, b, c} > > or even a step further, in case of JOIN: > C: {x, y, z, a, b, c} > (since join *joins* by x, why would there be two? This doesn't always work > for > other operations, of course) > > Reasoning: at least in my cases the names are descriptive from the start, > therefore there are almost no name conflicts. In rare cases where there are > Pig can determine that and use old syntax with "::", then let me deal with > it. > > I know this is backwards-incompatible change and is not likely to be > accepted, > but still... :) > > Anze > > > On Monday 06 December 2010, Alan Gates wrote: > > The reason it's needed is that ambiguities would result otherwise. > > > > A = load 'foo' as (x, y, z); > > B = load 'bar' as (w, x, y, z); > > C = join A by x, B by x; > > D = filter C by z > 0; -- which z? > > > > As long as the name is not ambiguous, the :: is not required. So in > > the above example it would be perfectly legal to say > > > > D = filter C by w > 0; > > > > Out of curiosity, why do you want to remove the :: names? > > > > Alan. > > > > On Dec 6, 2010, at 1:05 PM, Jonathan Coveney wrote: > > > Hijack away. I would be curious as to the reason we need this as well. > > > > > > 2010/12/6 Anze <[EMAIL PROTECTED]> > > > > > >> Sorry to hijack your question, Jonathan, but while we are at it... :) > > >> > > >> Is there a way to tell Pig NOT to add "base_alias::"? Almost half > > >> my code > > >> consists of FOREACH... GENERATE that just remove these prefixes. > > >> > > >> Thanks, > > >> > > >> Anze > > >> > > >> On Monday 06 December 2010, Daniel Dai wrote: > > >>> After join, cross, foreach flatten, Pig will automatically add > > >>> "base_alias::" prefix. All other cases use "." > > >>> > > >>> Daniel > > >>> > > >>> Jonathan Coveney wrote: > > >>>> It's very hard to search for this among the docs because it's so > > >> > > >> generic, > > >> > > >>>> so I thought I'd ask... I'm sure the answer is painfully easy. > > >>>> > > >>>> Taking a look at this code that I found online, for example > > >>>> > > >>>> -- > > >>>> -- Read in a bag of tuples (timeseries for this example) and > > >>>> divide the > > >>>> -- numeric column by its maximum. > > >>>> -- > > >>>> %default DATABAG 'data/timeseries.tsv' > > >>>> > > >>>> data = LOAD '$DATABAG' AS (month:chararray, count:int); > > >>>> accumulate = GROUP data ALL; > > >>>> calc_max = FOREACH accumulate GENERATE FLATTEN(data), > > >>>> MAX(data.count) AS max_count; > > >>>> normalize = FOREACH calc_max GENERATE data::month AS month, > > >>>> data::count AS count, (float)data::count / (float)max_count AS > > >>>> normed_count; > > >>>> DUMP normalize; > > >>>> > > >>>> What purpose does data::month serve versus data.count? > > >>>> > > >>>> Thanks > >
-
Re: Easy question...difference between this::form and this.form?Anze 2010-12-07, 17:16
If one uses meaningful names then Pig would never use '::' anyway. The problem is when you use multiple joins in sequence, then '::' names get very annoying. But that's just my opinion. :) Anze On Tuesday 07 December 2010, Jonathan Coveney wrote: > Would that even be much better? It seems like it'd be better to have it be > consistent in appending the whatever::, so that at least you have to be > cognizant of it when you do the join. If it starts being too clever, then > it's up to you to figure out when it does and doesn't do it which might be > annoying. > > 2010/12/7 Anze <[EMAIL PROTECTED]> > > > I understand the reason for this, it just seems like a drastic solution. > > :) > > > > Ideally, Pig should be clever enough to detect ambiguity and deal with > > it, and > > leave the non-conflicting names intact. For instance: > > > > A = load 'foo' as (x, y, z); > > B = load 'bar' as (x, a, b, c); > > C = join A by x, B by x; > > DESCRIBE C; > > C: {A::x, y, z, B::x, a, b, c} > > > > or even: > > C: {x, y, z, B::x, a, b, c} > > > > or even a step further, in case of JOIN: > > C: {x, y, z, a, b, c} > > (since join *joins* by x, why would there be two? This doesn't always > > work for > > other operations, of course) > > > > Reasoning: at least in my cases the names are descriptive from the start, > > therefore there are almost no name conflicts. In rare cases where there > > are Pig can determine that and use old syntax with "::", then let me > > deal with it. > > > > I know this is backwards-incompatible change and is not likely to be > > accepted, > > but still... :) > > > > Anze > > > > On Monday 06 December 2010, Alan Gates wrote: > > > The reason it's needed is that ambiguities would result otherwise. > > > > > > A = load 'foo' as (x, y, z); > > > B = load 'bar' as (w, x, y, z); > > > C = join A by x, B by x; > > > D = filter C by z > 0; -- which z? > > > > > > As long as the name is not ambiguous, the :: is not required. So in > > > the above example it would be perfectly legal to say > > > > > > D = filter C by w > 0; > > > > > > Out of curiosity, why do you want to remove the :: names? > > > > > > Alan. > > > > > > On Dec 6, 2010, at 1:05 PM, Jonathan Coveney wrote: > > > > Hijack away. I would be curious as to the reason we need this as > > > > well. > > > > > > > > 2010/12/6 Anze <[EMAIL PROTECTED]> > > > > > > > >> Sorry to hijack your question, Jonathan, but while we are at it... > > > >> :) > > > >> > > > >> Is there a way to tell Pig NOT to add "base_alias::"? Almost half > > > >> my code > > > >> consists of FOREACH... GENERATE that just remove these prefixes. > > > >> > > > >> Thanks, > > > >> > > > >> Anze > > > >> > > > >> On Monday 06 December 2010, Daniel Dai wrote: > > > >>> After join, cross, foreach flatten, Pig will automatically add > > > >>> "base_alias::" prefix. All other cases use "." > > > >>> > > > >>> Daniel > > > >>> > > > >>> Jonathan Coveney wrote: > > > >>>> It's very hard to search for this among the docs because it's so > > > >> > > > >> generic, > > > >> > > > >>>> so I thought I'd ask... I'm sure the answer is painfully easy. > > > >>>> > > > >>>> Taking a look at this code that I found online, for example > > > >>>> > > > >>>> -- > > > >>>> -- Read in a bag of tuples (timeseries for this example) and > > > >>>> divide the > > > >>>> -- numeric column by its maximum. > > > >>>> -- > > > >>>> %default DATABAG 'data/timeseries.tsv' > > > >>>> > > > >>>> data = LOAD '$DATABAG' AS (month:chararray, count:int); > > > >>>> accumulate = GROUP data ALL; > > > >>>> calc_max = FOREACH accumulate GENERATE FLATTEN(data), > > > >>>> MAX(data.count) AS max_count; > > > >>>> normalize = FOREACH calc_max GENERATE data::month AS month, > > > >>>> data::count AS count, (float)data::count / (float)max_count AS > > > >>>> normed_count; > > > >>>> DUMP normalize; > > > >>>> > > > >>>> What purpose does data::month serve versus data.count? > > > >>>> > > > >>>> Thanks
-
Re: Easy question...difference between this::form and this.form?Dmitriy Ryaboy 2010-12-07, 17:49
Consider self-joins, with regards to the meaningful name problem...
The sql way to deal with this issue is essentially to keep the name of the parent relation around during parsing, and require that you explicitly provide the desired parent if column names are ambiguous. That's probably something that could be implemented now that we have the required metadata in the operators (I believe it wasn't there when the disambiguation design was implemented). As far as difference between "::" and ".". The double-colon is just a string with no special meaning, it's simply part of the field name. The period is essentially a projection operator -- you are saying, "the thing to the left of the period is a tuple, and the thing to the right is a field in that tuple". (works for bags as well, in which case it means, the thing to the left of the period is a bag of tuples, and the thing to the right is a field in every tuple in the bag) -Dmitriy. 2010/12/7 Anze <[EMAIL PROTECTED]> > > If one uses meaningful names then Pig would never use '::' anyway. The > problem > is when you use multiple joins in sequence, then '::' names get very > annoying. > But that's just my opinion. :) > > Anze > > > On Tuesday 07 December 2010, Jonathan Coveney wrote: > > Would that even be much better? It seems like it'd be better to have it > be > > consistent in appending the whatever::, so that at least you have to be > > cognizant of it when you do the join. If it starts being too clever, then > > it's up to you to figure out when it does and doesn't do it which might > be > > annoying. > > > > 2010/12/7 Anze <[EMAIL PROTECTED]> > > > > > I understand the reason for this, it just seems like a drastic > solution. > > > :) > > > > > > Ideally, Pig should be clever enough to detect ambiguity and deal with > > > it, and > > > leave the non-conflicting names intact. For instance: > > > > > > A = load 'foo' as (x, y, z); > > > B = load 'bar' as (x, a, b, c); > > > C = join A by x, B by x; > > > DESCRIBE C; > > > C: {A::x, y, z, B::x, a, b, c} > > > > > > or even: > > > C: {x, y, z, B::x, a, b, c} > > > > > > or even a step further, in case of JOIN: > > > C: {x, y, z, a, b, c} > > > (since join *joins* by x, why would there be two? This doesn't always > > > work for > > > other operations, of course) > > > > > > Reasoning: at least in my cases the names are descriptive from the > start, > > > therefore there are almost no name conflicts. In rare cases where there > > > are Pig can determine that and use old syntax with "::", then let me > > > deal with it. > > > > > > I know this is backwards-incompatible change and is not likely to be > > > accepted, > > > but still... :) > > > > > > Anze > > > > > > On Monday 06 December 2010, Alan Gates wrote: > > > > The reason it's needed is that ambiguities would result otherwise. > > > > > > > > A = load 'foo' as (x, y, z); > > > > B = load 'bar' as (w, x, y, z); > > > > C = join A by x, B by x; > > > > D = filter C by z > 0; -- which z? > > > > > > > > As long as the name is not ambiguous, the :: is not required. So in > > > > the above example it would be perfectly legal to say > > > > > > > > D = filter C by w > 0; > > > > > > > > Out of curiosity, why do you want to remove the :: names? > > > > > > > > Alan. > > > > > > > > On Dec 6, 2010, at 1:05 PM, Jonathan Coveney wrote: > > > > > Hijack away. I would be curious as to the reason we need this as > > > > > well. > > > > > > > > > > 2010/12/6 Anze <[EMAIL PROTECTED]> > > > > > > > > > >> Sorry to hijack your question, Jonathan, but while we are at it... > > > > >> :) > > > > >> > > > > >> Is there a way to tell Pig NOT to add "base_alias::"? Almost half > > > > >> my code > > > > >> consists of FOREACH... GENERATE that just remove these prefixes. > > > > >> > > > > >> Thanks, > > > > >> > > > > >> Anze > > > > >> > > > > >> On Monday 06 December 2010, Daniel Dai wrote: > > > > >>> After join, cross, foreach flatten, Pig will automatically add > > > > >>> "base_alias::" prefix. All other cases use "."
-
RE: Easy question...difference between this::form and this.form?Santhosh Srinivasan 2010-12-07, 18:11
> The sql way to deal with this issue is essentially to keep the name of the parent relation
> around during parsing, and require that you explicitly provide the desired parent if column > names are ambiguous. That's probably something that could be implemented now that we have > the required metadata in the operators (I believe it wasn't there when the disambiguation > design was implemented). Isn't that true today? Unambiguous columns can be referenced without the :: operator. Santhosh -----Original Message----- From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] Sent: Tuesday, December 07, 2010 9:49 AM To: [EMAIL PROTECTED] Subject: Re: Easy question...difference between this::form and this.form? Consider self-joins, with regards to the meaningful name problem... The sql way to deal with this issue is essentially to keep the name of the parent relation around during parsing, and require that you explicitly provide the desired parent if column names are ambiguous. That's probably something that could be implemented now that we have the required metadata in the operators (I believe it wasn't there when the disambiguation design was implemented). As far as difference between "::" and ".". The double-colon is just a string with no special meaning, it's simply part of the field name. The period is essentially a projection operator -- you are saying, "the thing to the left of the period is a tuple, and the thing to the right is a field in that tuple". (works for bags as well, in which case it means, the thing to the left of the period is a bag of tuples, and the thing to the right is a field in every tuple in the bag) -Dmitriy. 2010/12/7 Anze <[EMAIL PROTECTED]> > > If one uses meaningful names then Pig would never use '::' anyway. The > problem is when you use multiple joins in sequence, then '::' names > get very annoying. > But that's just my opinion. :) > > Anze > > > On Tuesday 07 December 2010, Jonathan Coveney wrote: > > Would that even be much better? It seems like it'd be better to have > > it > be > > consistent in appending the whatever::, so that at least you have to > > be cognizant of it when you do the join. If it starts being too > > clever, then it's up to you to figure out when it does and doesn't > > do it which might > be > > annoying. > > > > 2010/12/7 Anze <[EMAIL PROTECTED]> > > > > > I understand the reason for this, it just seems like a drastic > solution. > > > :) > > > > > > Ideally, Pig should be clever enough to detect ambiguity and deal > > > with it, and leave the non-conflicting names intact. For instance: > > > > > > A = load 'foo' as (x, y, z); > > > B = load 'bar' as (x, a, b, c); > > > C = join A by x, B by x; > > > DESCRIBE C; > > > C: {A::x, y, z, B::x, a, b, c} > > > > > > or even: > > > C: {x, y, z, B::x, a, b, c} > > > > > > or even a step further, in case of JOIN: > > > C: {x, y, z, a, b, c} > > > (since join *joins* by x, why would there be two? This doesn't > > > always work for other operations, of course) > > > > > > Reasoning: at least in my cases the names are descriptive from the > start, > > > therefore there are almost no name conflicts. In rare cases where > > > there are Pig can determine that and use old syntax with "::", > > > then let me deal with it. > > > > > > I know this is backwards-incompatible change and is not likely to > > > be accepted, but still... :) > > > > > > Anze > > > > > > On Monday 06 December 2010, Alan Gates wrote: > > > > The reason it's needed is that ambiguities would result otherwise. > > > > > > > > A = load 'foo' as (x, y, z); > > > > B = load 'bar' as (w, x, y, z); > > > > C = join A by x, B by x; > > > > D = filter C by z > 0; -- which z? > > > > > > > > As long as the name is not ambiguous, the :: is not required. > > > > So in the above example it would be perfectly legal to say > > > > > > > > D = filter C by w > 0; > > > > > > > > Out of curiosity, why do you want to remove the :: names? > > > > > > > > Alan. > > > >
-
Re: Easy question...difference between this::form and this.form?Dmitriy Ryaboy 2010-12-08, 00:50
it's sort of true -- but, iirc, only goes one level deep, so once you do a
second join, you are stuck with "::"s On Tue, Dec 7, 2010 at 10:11 AM, Santhosh Srinivasan <[EMAIL PROTECTED]>wrote: > > The sql way to deal with this issue is essentially to keep the name of > the parent relation > > around during parsing, and require that you explicitly provide the > desired parent if column > > names are ambiguous. That's probably something that could be implemented > now that we have > > the required metadata in the operators (I believe it wasn't there when > the disambiguation > > design was implemented). > > Isn't that true today? Unambiguous columns can be referenced without the :: > operator. > > Santhosh > > -----Original Message----- > From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, December 07, 2010 9:49 AM > To: [EMAIL PROTECTED] > Subject: Re: Easy question...difference between this::form and this.form? > > Consider self-joins, with regards to the meaningful name problem... > > The sql way to deal with this issue is essentially to keep the name of the > parent relation around during parsing, and require that you explicitly > provide the desired parent if column names are ambiguous. That's probably > something that could be implemented now that we have the required metadata > in the operators (I believe it wasn't there when the disambiguation design > was implemented). > > As far as difference between "::" and ".". The double-colon is just a > string with no special meaning, it's simply part of the field name. The > period is essentially a projection operator -- you are saying, "the thing to > the left of the period is a tuple, and the thing to the right is a field in > that tuple". (works for bags as well, in which case it means, the thing to > the left of the period is a bag of tuples, and the thing to the right is a > field in every tuple in the bag) > > -Dmitriy. > > 2010/12/7 Anze <[EMAIL PROTECTED]> > > > > > If one uses meaningful names then Pig would never use '::' anyway. The > > problem is when you use multiple joins in sequence, then '::' names > > get very annoying. > > But that's just my opinion. :) > > > > Anze > > > > > > On Tuesday 07 December 2010, Jonathan Coveney wrote: > > > Would that even be much better? It seems like it'd be better to have > > > it > > be > > > consistent in appending the whatever::, so that at least you have to > > > be cognizant of it when you do the join. If it starts being too > > > clever, then it's up to you to figure out when it does and doesn't > > > do it which might > > be > > > annoying. > > > > > > 2010/12/7 Anze <[EMAIL PROTECTED]> > > > > > > > I understand the reason for this, it just seems like a drastic > > solution. > > > > :) > > > > > > > > Ideally, Pig should be clever enough to detect ambiguity and deal > > > > with it, and leave the non-conflicting names intact. For instance: > > > > > > > > A = load 'foo' as (x, y, z); > > > > B = load 'bar' as (x, a, b, c); > > > > C = join A by x, B by x; > > > > DESCRIBE C; > > > > C: {A::x, y, z, B::x, a, b, c} > > > > > > > > or even: > > > > C: {x, y, z, B::x, a, b, c} > > > > > > > > or even a step further, in case of JOIN: > > > > C: {x, y, z, a, b, c} > > > > (since join *joins* by x, why would there be two? This doesn't > > > > always work for other operations, of course) > > > > > > > > Reasoning: at least in my cases the names are descriptive from the > > start, > > > > therefore there are almost no name conflicts. In rare cases where > > > > there are Pig can determine that and use old syntax with "::", > > > > then let me deal with it. > > > > > > > > I know this is backwards-incompatible change and is not likely to > > > > be accepted, but still... :) > > > > > > > > Anze > > > > > > > > On Monday 06 December 2010, Alan Gates wrote: > > > > > The reason it's needed is that ambiguities would result otherwise. > > > > > > > > > > A = load 'foo' as (x, y, z); > > > > > B = load 'bar' as (w, x, y, z);
-
Re: Easy question...difference between this::form and this.form?Anze 2010-12-08, 08:24
I'm curious - is this a problem for others as well? Do you keep 'A::C::myId' or do you use FOREACH... GENERATE after each JOIN? About possible workarounds: Is it possible to write an UDF that would automatically strip 'X::' from the start of the names? For instance: C: {A::x, A::y, B::x, B::v} C = FLATTEN_NAMES(C, 'x'); C: {x, y, v} ('x' is the name of the column on which JOIN was made, if it is the same in A and B) Can sth. like this be done with UDFs? (I admit it's ugly, but... ;) Another way would be to add an argument to the JOIN (& co.), telling it to use flat names and to fail with error if the names are ambiguous: C = JOIN A by x, B by x FLATTEN_NAMES; C: {x, y, v} Anze On Wednesday 08 December 2010, Dmitriy Ryaboy wrote: > it's sort of true -- but, iirc, only goes one level deep, so once you do a > second join, you are stuck with "::"s > > On Tue, Dec 7, 2010 at 10:11 AM, Santhosh Srinivasan <sms@yahoo- inc.com>wrote: > > > The sql way to deal with this issue is essentially to keep the name of > > > > the parent relation > > > > > around during parsing, and require that you explicitly provide the > > > > desired parent if column > > > > > names are ambiguous. That's probably something that could be > > > implemented > > > > now that we have > > > > > the required metadata in the operators (I believe it wasn't there when > > > > the disambiguation > > > > > design was implemented). > > > > Isn't that true today? Unambiguous columns can be referenced without the > > :: operator. > > > > Santhosh > > > > -----Original Message----- > > From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] > > Sent: Tuesday, December 07, 2010 9:49 AM > > To: [EMAIL PROTECTED] > > Subject: Re: Easy question...difference between this::form and this.form? > > > > Consider self-joins, with regards to the meaningful name problem... > > > > The sql way to deal with this issue is essentially to keep the name of > > the parent relation around during parsing, and require that you > > explicitly provide the desired parent if column names are ambiguous. > > That's probably something that could be implemented now that we have the > > required metadata in the operators (I believe it wasn't there when the > > disambiguation design was implemented). > > > > As far as difference between "::" and ".". The double-colon is just a > > string with no special meaning, it's simply part of the field name. The > > period is essentially a projection operator -- you are saying, "the thing > > to the left of the period is a tuple, and the thing to the right is a > > field in that tuple". (works for bags as well, in which case it means, > > the thing to the left of the period is a bag of tuples, and the thing to > > the right is a field in every tuple in the bag) > > > > -Dmitriy. > > > > 2010/12/7 Anze <[EMAIL PROTECTED]> > > > > > If one uses meaningful names then Pig would never use '::' anyway. The > > > problem is when you use multiple joins in sequence, then '::' names > > > get very annoying. > > > But that's just my opinion. :) > > > > > > Anze > > > > > > On Tuesday 07 December 2010, Jonathan Coveney wrote: > > > > Would that even be much better? It seems like it'd be better to have > > > > it > > > > > > be > > > > > > > consistent in appending the whatever::, so that at least you have to > > > > be cognizant of it when you do the join. If it starts being too > > > > clever, then it's up to you to figure out when it does and doesn't > > > > do it which might > > > > > > be > > > > > > > annoying. > > > > > > > > 2010/12/7 Anze <[EMAIL PROTECTED]> > > > > > > > > > I understand the reason for this, it just seems like a drastic > > > > > > solution. > > > > > > > > :) > > > > > > > > > > Ideally, Pig should be clever enough to detect ambiguity and deal > > > > > with it, and leave the non-conflicting names intact. For instance: > > > > > > > > > > A = load 'foo' as (x, y, z); > > > > > B = load 'bar' as (x, a, b, c); > >
-
RE: Easy question...difference between this::form and this.form?Santhosh Srinivasan 2010-12-08, 20:02
Unambiguous column names can be accessed as is without the :: An example that demonstrates it follows:
grunt> a = load 'a' as (x, y, XX); grunt> b = load 'b' as (x, y, YY); grunt> c = load 'c' as (x,y, ZZ); grunt> d = join a by $0, b by $0; grunt> describe d; d: {a::x: bytearray,a::y: bytearray,a::XX: bytearray,b::x: bytearray,b::y: bytearray,b::YY: bytearray} grunt> e = join d by $0, c by $0; grunt> describe e; e: {d::a::x: bytearray,d::a::y: bytearray,d::a::XX: bytearray,d::b::x: bytearray,d::b::y: bytearray,d::b::YY: bytearray,c::x: bytearray,c::y: bytearray,c::ZZ: bytearray} grunt> f = foreach e generate XX; -------------------------------------------^^^ grunt> describe f; f: {d::a::XX: bytearray} -----Original Message----- From: Anze [mailto:[EMAIL PROTECTED]] Sent: Wednesday, December 08, 2010 12:24 AM To: [EMAIL PROTECTED] Subject: Re: Easy question...difference between this::form and this.form? I'm curious - is this a problem for others as well? Do you keep 'A::C::myId' or do you use FOREACH... GENERATE after each JOIN? About possible workarounds: Is it possible to write an UDF that would automatically strip 'X::' from the start of the names? For instance: C: {A::x, A::y, B::x, B::v} C = FLATTEN_NAMES(C, 'x'); C: {x, y, v} ('x' is the name of the column on which JOIN was made, if it is the same in A and B) Can sth. like this be done with UDFs? (I admit it's ugly, but... ;) Another way would be to add an argument to the JOIN (& co.), telling it to use flat names and to fail with error if the names are ambiguous: C = JOIN A by x, B by x FLATTEN_NAMES; C: {x, y, v} Anze On Wednesday 08 December 2010, Dmitriy Ryaboy wrote: > it's sort of true -- but, iirc, only goes one level deep, so once you > do a second join, you are stuck with "::"s > > On Tue, Dec 7, 2010 at 10:11 AM, Santhosh Srinivasan <sms@yahoo- inc.com>wrote: > > > The sql way to deal with this issue is essentially to keep the > > > name of > > > > the parent relation > > > > > around during parsing, and require that you explicitly provide the > > > > desired parent if column > > > > > names are ambiguous. That's probably something that could be > > > implemented > > > > now that we have > > > > > the required metadata in the operators (I believe it wasn't there > > > when > > > > the disambiguation > > > > > design was implemented). > > > > Isn't that true today? Unambiguous columns can be referenced without > > the > > :: operator. > > > > Santhosh > > > > -----Original Message----- > > From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] > > Sent: Tuesday, December 07, 2010 9:49 AM > > To: [EMAIL PROTECTED] > > Subject: Re: Easy question...difference between this::form and this.form? > > > > Consider self-joins, with regards to the meaningful name problem... > > > > The sql way to deal with this issue is essentially to keep the name > > of the parent relation around during parsing, and require that you > > explicitly provide the desired parent if column names are ambiguous. > > That's probably something that could be implemented now that we have > > the required metadata in the operators (I believe it wasn't there > > when the disambiguation design was implemented). > > > > As far as difference between "::" and ".". The double-colon is just > > a string with no special meaning, it's simply part of the field > > name. The period is essentially a projection operator -- you are > > saying, "the thing to the left of the period is a tuple, and the > > thing to the right is a field in that tuple". (works for bags as > > well, in which case it means, the thing to the left of the period is > > a bag of tuples, and the thing to the right is a field in every > > tuple in the bag) > > > > -Dmitriy. > > > > 2010/12/7 Anze <[EMAIL PROTECTED]> > > > > > If one uses meaningful names then Pig would never use '::' anyway. > > > The problem is when you use multiple joins in sequence, then '::' > > > names get very annoying. |