-Re: Pig storage and load functions and Cache
abhishek dodda 2012-10-16, 03:10
Thanks for the information.
Can you share your views on the below query.
Load or storing in which of the above format.Will optimize the
queries.Considering i have text files.
On Mon, Oct 8, 2012 at 12:10 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Pig has multi-query execution optimization built-in. If you compute
> multiple relations in your script that share parent relations, those
> parent relations will be computed only once. You don't have to do
> anything to make that happen.
> If you prefer to handle your own caching, you would have to handle it
> yourself, of course.
> There is some academic work on reusing parts of previous runs of the
> same script (potentially on overlapping, but not identical datasets);
> the papers to read are:
> Nectar http://research.microsoft.com/apps/pubs/default.aspx?id=131525
> ReStore: http://vldb.org/pvldb/vol5/p586_imanelghandour_vldb2012.pdf
> There are a lot of papers on iterative mapreduce, I am sure if you
> start with ReStore citations and/or Google Scholar, you'll find some.
> None of that has yet made it into Pig yet; I believe a general compute
> caching framework would be very useful, and look forward to someone
> taking up that challenge..
> On Fri, Oct 5, 2012 at 2:51 PM, Abhishek <[EMAIL PROTECTED]> wrote:
>> Load or storing in which of the above format.Will optimize the queries.
>> Can cache be any where in pig.How can the cache be use ful in pig.