Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - schema definition and subschema


Copy link to this message
-
Re: schema definition and subschema
Cheolsoo Park 2013-08-14, 16:43
Hi Keren,

Hope this is too late.

>> I am wondering why is LogicalFieldShema containing a LogicalSchema
member?

That's for nested tuple fields. For example, consider "( i:int,
t:tuple(j:int) )". The field t:tuple needs to contain a list of field
schemas, so you need a LogicalSchema. Here is how you can verify it.

1) Debug Pig main in eclipse.
2) Set a breakpoint in the LogicalFieldSchema constructor.
3) Run "a = load '/dev/null' as (i:int, t:tuple(j:int));" on grunt.

Thanks,
Cheolsoo
On Thu, Aug 8, 2013 at 2:42 PM, Keren Ouaknine <[EMAIL PROTECTED]> wrote:

> Hi,
>
> A schema in Pig (LogicalSchema.java) is defined as an array list of
> LogicalFieldSchema whose class members are:
> - String alias
> - byte type
> - long uid
> - LogicalSchema schema
>
> I am wondering why is LogicalFieldShema containing a LogicalSchema member?
> My guess so far is that perhaps there's a subschema used by some operators?
> I tried to figure out which operators might be using it and categorized the
> main ones as follow:
>
> ==> SCHEMA IS DEFINED BY INPUT SCHEMA ONLY
> LOAD
> DISTINCT
> FILTER
> ORDER BY
> SPLIT
>
> ==> SCHEMA IS DEFINED BY THE LIST OF "AS" IN THE FOREACH STATEMENT
> FOREACH
>
> ==> IF SCHEMA CAN BE DEFINED (SAME LENGTH AND CASTABLE) OR UNKNOWN SCHEMA
> UNION
>
> ==> SCHEMA IS DEFINED BY THE CONCATENATION OF THE TWO INPUT SCHEMAS (+
> ADDING THE ALIAS TO THE FIELD NAME x ==> A::x)
> JOIN
> *Are the two inputs here considered subschemas?*
>
> ==> SCHEMA: (key_to_order_by, bag)
> GROUP
>
> Thanks,
> Keren
>
> --
> Keren Ouaknine
> Web: www.kereno.com
>