Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Early projection and lazy casting


Copy link to this message
-
Early projection and lazy casting
Jie Li 2011-12-03, 00:05
Hi all,

We just figured out Pig 0.9.1 doesn't drop those non-necessary fields asap,
which really affects the performance. Though
http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html#loadfunc_loaderpushdownsaid
that "As part of its optimizations Pig analyzes Pig Latin scripts and
determines what fields in an input it needs at each step in the script. It
uses this information to aggressively drop fields it no longer needs."

We also found that Pig casts the data into the types defined in the schema,
which is usually unnecessary, as most of them will be soon dropped.

To work around these, we have to manually drop those fields and remove the
types in the schema, which are really not interesting.

Jie