Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Review Request 14274: PIG-2672 Optimize the use of DistributedCache


Copy link to this message
-
Re: Review Request 14274: PIG-2672 Optimize the use of DistributedCache


> On Sept. 25, 2013, 12:13 a.m., Rohini Palaniswamy wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java, line 1492
> > <https://reviews.apache.org/r/14274/diff/1/?file=355174#file355174line1492>
> >
> >     If hdfs path use as is and do not ship to jar cache. It will also save time and hash checks.

Currently, PigServer#registerJar localizes all the jars. So, this would need some more refactor before we can do this. I will try to solve this in a separate jira.
> On Sept. 25, 2013, 12:13 a.m., Rohini Palaniswamy wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java, line 1495
> > <https://reviews.apache.org/r/14274/diff/1/?file=355174#file355174line1495>
> >
> >     Since the name of the file on hdfs is different from that of the actual file, create a symlink with the actual filename. Some users might depend on the actual file name.
>
> Rohini Palaniswamy wrote:
>     One case I see is python scripts(jython UDFs) which do imports based on the file name. Would be the same for other scripting languages that we support. It would be good to run the full unit and e2e test with your patch before going for a commit

May be I should avoid renaming the files and just put them under /a/b/c/abcdefsha1/udf.jar.
> On Sept. 25, 2013, 12:13 a.m., Rohini Palaniswamy wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java, line 1509
> > <https://reviews.apache.org/r/14274/diff/1/?file=355174#file355174line1509>
> >
> >     First do a file size comparison before calculating checksum for better efficiency

Size check would require stat calls to nn, this being local should be quicker than that.
- Aniket
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14274/#review26364
-----------------------------------------------------------
On Sept. 21, 2013, 1:21 a.m., Aniket Mokashi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14274/
> -----------------------------------------------------------
>
> (Updated Sept. 21, 2013, 1:21 a.m.)
>
>
> Review request for pig, Cheolsoo Park, DanielWX DanielWX, Dmitriy Ryaboy, Julien Le Dem, and Rohini Palaniswamy.
>
>
> Bugs: PIG-2672
>     https://issues.apache.org/jira/browse/PIG-2672
>
>
> Repository: pig
>
>
> Description
> -------
>
> added jar.cache.location option
>
>
> Diffs
> -----
>
>   trunk/src/org/apache/pig/PigConstants.java 1525188
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1525188
>   trunk/src/org/apache/pig/impl/PigContext.java 1525188
>   trunk/src/org/apache/pig/impl/io/FileLocalizer.java 1525188
>   trunk/test/org/apache/pig/test/TestJobControlCompiler.java 1525188
>
> Diff: https://reviews.apache.org/r/14274/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Aniket Mokashi
>
>