Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> schema definition and subschema


Copy link to this message
-
schema definition and subschema
Hi,

A schema in Pig (LogicalSchema.java) is defined as an array list of
LogicalFieldSchema whose class members are:
- String alias
- byte type
- long uid
- LogicalSchema schema

I am wondering why is LogicalFieldShema containing a LogicalSchema member?
My guess so far is that perhaps there's a subschema used by some operators?
I tried to figure out which operators might be using it and categorized the
main ones as follow:

==> SCHEMA IS DEFINED BY INPUT SCHEMA ONLY
LOAD
DISTINCT
FILTER
ORDER BY
SPLIT

==> SCHEMA IS DEFINED BY THE LIST OF "AS" IN THE FOREACH STATEMENT
FOREACH

==> IF SCHEMA CAN BE DEFINED (SAME LENGTH AND CASTABLE) OR UNKNOWN SCHEMA
UNION

==> SCHEMA IS DEFINED BY THE CONCATENATION OF THE TWO INPUT SCHEMAS (+
ADDING THE ALIAS TO THE FIELD NAME x ==> A::x)
JOIN
*Are the two inputs here considered subschemas?*

==> SCHEMA: (key_to_order_by, bag)
GROUP

Thanks,
Keren

--
Keren Ouaknine
Web: www.kereno.com