Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Optimizations in pig

Copy link to this message
Re: Optimizations in pig
Thanks for your detailed explanation, I have some doubts which are
below please clarify them

On Thu, Oct 4, 2012 at 4:59 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> bucketing and partitioning is just setting the files up right. you can
> do that explicitly.

-- How can i do buckets explicitly i don't get your point here.

> Pig also lets you push down any filtering and projection into the
> loader, as long as said loader is aware of how to deal with filters
> and projections. Using any such loader will give you the benefits.

-- Hi what loader your are talking about can you please elaborate on this.

> HCatLoader is one such implementation (and can use Hive's metastore to
> filter partitions).
> Optimized / custom stores and loads are supported via the StoreFunc
> and LoadFunc implementation

-- Can you please point me to , some of the optimized store or load functions

 -- write your own, or use one of the many
> existing ones. RCFile is supported via RCFileLoader in piggybank.
> There is extensive
> SequenceFile support (and some additional RCFile support) in the
> Elephant-Bird project from Twitter (disclaimer: that's my group's
> project).

> Indexing is a special case of filter pushdowns; not as well developed
> as Hive's, but the Elephant-Twin project can help if you aren't afraid
> of rolling up your sleeves. (same disclaimer).
> There are also multiple join and grouping strategies.
> Setting any properties can be achieved via "set property.name value;"

-- Generally what kind of property's you override in pig grunt shell,
important properties to over ride.


> D
> On Thu, Oct 4, 2012 at 4:35 PM, TianYi Zhu
> <[EMAIL PROTECTED]> wrote:
>> Hi Abhishek,
>> http://archive.cloudera.com/cdh4/cdh/4/pig/perf.html
>> http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html
>> On Fri, Oct 5, 2012 at 8:18 AM, Abhishek <[EMAIL PROTECTED]> wrote:
>>> Hi all,
>>> I am new to pig.
>>> In hive we can optimize the code by using
>>> Indexing
>>> Bucketing
>>> Partitions
>>> Storing the file in different formats, such as Rc file,sequence file
>>> Overriding some property in the hive shell.
>>> By using
>>> Set property name = value;
>>> Override some default property in grunt shell.
>>> How can use optimizations in pig.
>>> Regards
>>> Abhi
>>> Sent from my iPhone