Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Early projection and lazy casting


Copy link to this message
-
Re: Early projection and lazy casting
Jie Li 2011-12-03, 00:45
Why do joins prevent the early projection? Actually join has the greatest
need for it.

Jie

On Fri, Dec 2, 2011 at 7:33 PM, Jonathan Coveney <[EMAIL PROTECTED]> wrote:

> In what context? I always thought that it generally could, but that if you
> do joins it doesn't. Would be curious to know more from someone who
> knows...
>
> 2011/12/2 Jie Li <[EMAIL PROTECTED]>
>
> > Hi all,
> >
> > We just figured out Pig 0.9.1 doesn't drop those non-necessary fields
> asap,
> > which really affects the performance. Though
> >
> >
> http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html#loadfunc_loaderpushdownsaid
> > that "As part of its optimizations Pig analyzes Pig Latin scripts and
> > determines what fields in an input it needs at each step in the script.
> It
> > uses this information to aggressively drop fields it no longer needs."
> >
> > We also found that Pig casts the data into the types defined in the
> schema,
> > which is usually unnecessary, as most of them will be soon dropped.
> >
> > To work around these, we have to manually drop those fields and remove
> the
> > types in the schema, which are really not interesting.
> >
> > Jie
> >
>