Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Early projection and lazy casting


Copy link to this message
-
Re: Early projection and lazy casting
Why do joins prevent the early projection? Actually join has the greatest
need for it.

Jie

On Fri, Dec 2, 2011 at 7:33 PM, Jonathan Coveney <[EMAIL PROTECTED]> wrote:

> In what context? I always thought that it generally could, but that if you
> do joins it doesn't. Would be curious to know more from someone who
> knows...
>
> 2011/12/2 Jie Li <[EMAIL PROTECTED]>
>
> > Hi all,
> >
> > We just figured out Pig 0.9.1 doesn't drop those non-necessary fields
> asap,
> > which really affects the performance. Though
> >
> >
> http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html#loadfunc_loaderpushdownsaid
> > that "As part of its optimizations Pig analyzes Pig Latin scripts and
> > determines what fields in an input it needs at each step in the script.
> It
> > uses this information to aggressively drop fields it no longer needs."
> >
> > We also found that Pig casts the data into the types defined in the
> schema,
> > which is usually unnecessary, as most of them will be soon dropped.
> >
> > To work around these, we have to manually drop those fields and remove
> the
> > types in the schema, which are really not interesting.
> >
> > Jie
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB