|
|
+
Abhishek 2012-10-04, 22:18
+
TianYi Zhu 2012-10-04, 23:35
-
Re: Optimizations in pigDmitriy Ryaboy 2012-10-04, 23:59
bucketing and partitioning is just setting the files up right. you can
do that explicitly. Pig also lets you push down any filtering and projection into the loader, as long as said loader is aware of how to deal with filters and projections. Using any such loader will give you the benefits. HCatLoader is one such implementation (and can use Hive's metastore to filter partitions). Optimized / custom stores and loads are supported via the StoreFunc and LoadFunc implementation -- write your own, or use one of the many existing ones. RCFile is supported via RCFileLoader in piggybank. There is extensive SequenceFile support (and some additional RCFile support) in the Elephant-Bird project from Twitter (disclaimer: that's my group's project). Indexing is a special case of filter pushdowns; not as well developed as Hive's, but the Elephant-Twin project can help if you aren't afraid of rolling up your sleeves. (same disclaimer). There are also multiple join and grouping strategies. Setting any properties can be achieved via "set property.name value;" D On Thu, Oct 4, 2012 at 4:35 PM, TianYi Zhu <[EMAIL PROTECTED]> wrote: > Hi Abhishek, > > http://archive.cloudera.com/cdh4/cdh/4/pig/perf.html > http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html > > On Fri, Oct 5, 2012 at 8:18 AM, Abhishek <[EMAIL PROTECTED]> wrote: > >> Hi all, >> >> I am new to pig. >> >> In hive we can optimize the code by using >> >> Indexing >> Bucketing >> Partitions >> Storing the file in different formats, such as Rc file,sequence file >> >> Overriding some property in the hive shell. >> >> By using >> >> Set property name = value; >> >> Override some default property in grunt shell. >> >> How can use optimizations in pig. >> >> Regards >> Abhi >> >> >> Sent from my iPhone >> +
abhishek dodda 2012-10-05, 01:04
+
abhishek dodda 2012-10-05, 01:05
|