Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Scalar problem


Copy link to this message
-
Re: Scalar problem
I was referring to 1 below.  I think making this mandatory instead of allowed is sufficient (obviously over time, so we don't break computability).  

Alan.

On Apr 9, 2012, at 10:21 AM, Dmitriy Ryaboy wrote:

> Alan, which idea are you +1 on? I think (int) D is the current syntax.
>
> There are a couple problems that people hit in the current scalar
> implementation, both of which I think can be fixed without introducing
> new syntax:
>
> 1) Require the cast, don't do it implicitly. This was actually in the
> design doc but didn't get implemented for some reason.
>
> 2) Throw an error on the frontend if the scalar relation is the
> relation being iterated on. Meaning:
>
> foreach foo generate (int) foo.id; -- this will cause the second "foo"
> to be interpreted as a scalar invocation, although clearly it's just a
> bug, and the programmer mean to say "generate (int) id"
>
> We can just detect this error case and throw during compilation.
>
> 3) Improve MR-side logging to make it clear that a relation is being
> loaded from the side, what the relation is, etc.
>
> I believe we have jiras open for all of these..
>
> D
>
> On Mon, Apr 9, 2012 at 10:15 AM, Alan Gates <[EMAIL PROTECTED]> wrote:
>> I'm +1 on this idea, since it's been a problem since the beginning.  Why not use regular casting notation though, rather than develop another notation?  That's what we discussed originally when we were deciding whether to require casting or do it silently.  So instead of D->a or SCALAR(D) it would be (int)D.
>>
>> Alan.
>>
>> On Apr 8, 2012, at 7:42 AM, Jonathan Coveney wrote:
>>
>>> I like this idea, and I think we should deprecate the old syntax, and we
>>> can discuss later when it'd get deleted (and when that would be worth it...
>>> if we have a new syntax, it seems pretty painless to have the other one
>>> float around for backwards compatibility, and if anyone uses it it's a sort
>>> of "caveat emptor").
>>>
>>> 2012/4/8 Aniket Mokashi <[EMAIL PROTECTED]>
>>>
>>>> Hi,
>>>>
>>>> I have noticed early users of pig often hit issues because of confusing
>>>> syntax between scalars and projections. I think scalar syntax should be
>>>> made more explicit for users to use in order to avoid these problems. For
>>>> example- D = foreach C generate B->count; etc.
>>>> I am sure we might break some backward compatibility but we can at least
>>>> deprecate the syntax for a few versions and eventually move to new syntax.
>>>>
>>>> Thoughts?
>>>>
>>>> Thanks,
>>>> Aniket
>>>>
>>