Daniel Dai 2011-01-14, 04:58
Dmitriy Ryaboy 2011-01-14, 06:54
Daniel Dai 2011-01-14, 20:11
Olga Natkovich 2011-01-14, 21:12
Julien Le Dem 2011-01-14, 22:01
Scott Carey 2011-01-14, 20:27
Dmitriy Ryaboy 2011-01-14, 21:34
Dmitriy Ryaboy 2011-01-14, 21:35
Alan Gates 2011-01-14, 22:00
Dmitriy Ryaboy 2011-01-14, 22:15
Maps are sometimes used to represent JSON or similar data structures.
The resulting Pig objects are Maps with String keys and values being either: String, Number, Map, Bag (and recursively).
On 1/14/11 2:15 PM, "Dmitriy Ryaboy" <[EMAIL PROTECTED]> wrote:
fwiw most of our maps wind up being mixes of string->double and
string->string. Sometimes string->map and string->bag . Having non-string
keys would really help us but I know that was pulled for a reason..
On Fri, Jan 14, 2011 at 2:00 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
> I think the big win of static typing is that from examining the script
> alone you can know the output:
> A = load 'bla' using BinStorage();
> B = foreach A generate $0 + $1;
> With static typing $0 and $1 will both be viewed as bytearrays and thus
> will be cast to doubles, regardless of how BinStorage actually instantiated
> them. With dynamic types we cannot know the answers without knowing the
> data that is fed through.
> The downside of the static typing case is that we explicitly allow unknown
> types in maps:
> A = load 'bla' using AvroStorage(); -- assume bla has a schema of m:map
> -- and that m has two
> keys, k1 and k2
> -- both with integer
> B = foreach A generate m#k1 + m#k2;
> Using static types, B.$0 will be a double, even though the underlying types
> are ints. Users will not see that as intuitive even though the semantic is
> clear. In the dynamic model proposed by Daniel, B.$0 will be an int.
> We are mitigating this case by allowing typed maps (where the value type of
> the map is declarable) in 0.9. But maps with heterogenous values types will
> still suffer from this issue.
> I vote for static types for several reasons:
> 1) I like being able to know the output of the script by examining the
> script alone. It provides a clear semantic that we can explain to users.
> 2) It's less of a maintenance cost, as the need to deal with dynamic type
> discovery is confined to the cast operator. If we go full out dynamic types
> every expression operator has to be able to manage dynamism for byte arrays.
> 3) In my experience almost all maps are string->string so once we allow
> typed maps I suspect people will start using them heavily.
> I'm not sure there's a performance gain either way, since in both cases we
> have to manage the case where we think something is a bytearray and it turns
> out to be something else.
> On Jan 14, 2011, at 1:34 PM, Dmitriy Ryaboy wrote:
> Agreed with what Scott said about procedurally building schemas, and what
>> Olga said about static typing.
>> Daniel, I am not sure what you mean about run-time typing on a row by row
>> basis. Certainly winding up with columns that are sometimes doubles,
>> sometimes floats, and sometimes ints can only lead to unexpected bugs?
>> I know Yahoo went through a lot of pain with the LoadStore rework in 0.7
>> (heck I am still dealing with it), but seems like breaking compatibility
>> a minor way in order to clean up semantics is ok given that we had a
>> "stable" version in between. I don't think conversion would be too
>> especially if declaring schemas is simplified.
>> We can just say that odd versions can break apis and even can't :).
>> On Fri, Jan 14, 2011 at 12:27 PM, Scott Carey <[EMAIL PROTECTED]
>>> On 1/13/11 10:54 PM, "Dmitriy Ryaboy" <[EMAIL PROTECTED]> wrote:
>>> How is runtime detection done? I worry that if 1.txt contains:
>>>> 1, 2
>>>> 1.1, 2.2
>>>> We get into a situation where addition of the fields in the first tuple
>>>> produces integers, and adding the fields of the second tuple produces
>>>> A more invasive but perhaps easier to reason about solution might be to
>>>> stricter about types, and require bytearrays to be cast to whatever type
Julien Le Dem 2011-01-14, 21:57
Thejas M Nair 2011-01-14, 21:00
Olga Natkovich 2011-01-14, 21:16