Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Pig storage and load functions and Cache


Copy link to this message
-
Re: Pig storage and load functions and Cache
abhishek dodda 2012-10-16, 03:10
hi Dmitriy,

Thanks for the information.

Can you share your views on the below query.

BinStorage()
PigDump()
PigStorage()
TextLoader()

Load or storing in which of the above format.Will optimize the
queries.Considering i have text files.

Regards
Abhi

On Mon, Oct 8, 2012 at 12:10 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Pig has multi-query execution optimization built-in. If you compute
> multiple relations in your script that share parent relations, those
> parent relations will be computed only once. You don't have to do
> anything to make that happen.
>
> If you prefer to handle your own caching, you would have to handle it
> yourself, of course.
>
> There is some academic work on reusing parts of previous runs of the
> same script (potentially on overlapping, but not identical datasets);
> the papers to read are:
> Nectar http://research.microsoft.com/apps/pubs/default.aspx?id=131525
> ReStore: http://vldb.org/pvldb/vol5/p586_imanelghandour_vldb2012.pdf
>
> There are a lot of papers on iterative mapreduce, I am sure if you
> start with ReStore citations and/or Google Scholar, you'll find some.
>
> None of that has yet made it into Pig yet; I believe a general compute
> caching framework would be very useful, and look forward to someone
> taking up that challenge..
>
> D
>
> On Fri, Oct 5, 2012 at 2:51 PM, Abhishek <[EMAIL PROTECTED]> wrote:
>> BinStorage()
>> PigDump()
>> PigStorage()
>> TextLoader()
>>
>> Load or storing in which of the above format.Will optimize the queries.
>>
>> Can cache be any where in pig.How can the cache be use ful in pig.
>>
>> Regards
>> Abhi