Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> schema definition and subschema


Copy link to this message
-
schema definition and subschema
Hi,

A schema in Pig (LogicalSchema.java) is defined as an array list of
LogicalFieldSchema whose class members are:
- String alias
- byte type
- long uid
- LogicalSchema schema

I am wondering why is LogicalFieldShema containing a LogicalSchema member?
My guess so far is that perhaps there's a subschema used by some operators?
I tried to figure out which operators might be using it and categorized the
main ones as follow:

==> SCHEMA IS DEFINED BY INPUT SCHEMA ONLY
LOAD
DISTINCT
FILTER
ORDER BY
SPLIT

==> SCHEMA IS DEFINED BY THE LIST OF "AS" IN THE FOREACH STATEMENT
FOREACH

==> IF SCHEMA CAN BE DEFINED (SAME LENGTH AND CASTABLE) OR UNKNOWN SCHEMA
UNION

==> SCHEMA IS DEFINED BY THE CONCATENATION OF THE TWO INPUT SCHEMAS (+
ADDING THE ALIAS TO THE FIELD NAME x ==> A::x)
JOIN
*Are the two inputs here considered subschemas?*

==> SCHEMA: (key_to_order_by, bag)
GROUP

Thanks,
Keren

--
Keren Ouaknine
Web: www.kereno.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB