-Re: schema definition and subschema
Cheolsoo Park 2013-08-14, 16:43
Hope this is too late.
>> I am wondering why is LogicalFieldShema containing a LogicalSchema
That's for nested tuple fields. For example, consider "( i:int,
t:tuple(j:int) )". The field t:tuple needs to contain a list of field
schemas, so you need a LogicalSchema. Here is how you can verify it.
1) Debug Pig main in eclipse.
2) Set a breakpoint in the LogicalFieldSchema constructor.
3) Run "a = load '/dev/null' as (i:int, t:tuple(j:int));" on grunt.
On Thu, Aug 8, 2013 at 2:42 PM, Keren Ouaknine <[EMAIL PROTECTED]> wrote:
> A schema in Pig (LogicalSchema.java) is defined as an array list of
> LogicalFieldSchema whose class members are:
> - String alias
> - byte type
> - long uid
> - LogicalSchema schema
> I am wondering why is LogicalFieldShema containing a LogicalSchema member?
> My guess so far is that perhaps there's a subschema used by some operators?
> I tried to figure out which operators might be using it and categorized the
> main ones as follow:
> ==> SCHEMA IS DEFINED BY INPUT SCHEMA ONLY
> ORDER BY
> ==> SCHEMA IS DEFINED BY THE LIST OF "AS" IN THE FOREACH STATEMENT
> ==> IF SCHEMA CAN BE DEFINED (SAME LENGTH AND CASTABLE) OR UNKNOWN SCHEMA
> ==> SCHEMA IS DEFINED BY THE CONCATENATION OF THE TWO INPUT SCHEMAS (+
> ADDING THE ALIAS TO THE FIELD NAME x ==> A::x)
> *Are the two inputs here considered subschemas?*
> ==> SCHEMA: (key_to_order_by, bag)
> Keren Ouaknine
> Web: www.kereno.com