Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Re: Why does Pig not use default resources from the Configuration object?


+
Prashant Kommireddi 2013-04-13, 04:57
+
Bhooshan Mogal 2013-04-15, 17:35
Copy link to this message
-
Re: Why does Pig not use default resources from the Configuration object?
Prashant Kommireddi 2013-04-15, 18:58
Hi Bhooshan,

This makes more sense now. I think overriding fs implementation should go
into core-site.xml, but it would be useful to be able to add resources if
you have a bunch of other properties.

Would you like to submit a patch? It should be based on a pig property that
suggests the additional resource names (myfs-site.xml) in your case.

-Prashant
On Mon, Apr 15, 2013 at 10:35 AM, Bhooshan Mogal
<[EMAIL PROTECTED]>wrote:

> Hi Prashant,
>
>
> Yes, I am running in MapReduce mode. Let me give you the steps in the
> scenario that I am trying to test -
>
> 1. I have my own implementation of org.apache.hadoop.fs.FileSystem for a
> filesystem I am trying to implement - Let's call it MyFileSystem.class.
> This filesystem uses the scheme myfs:// for its URIs
> 2. I have set fs.myfs.impl to MyFileSystem.class in core-site.xml and made
> the class available through a jar file that is part of HADOOP_CLASSPATH (or
> PIG_CLASSPATH).
> 3. In MyFileSystem.class, I have a static block as -
> static {
>     Configuration.addDefaultResource("myfs-default.xml");
>     Configuration.addDefaultResource("myfs-site.xml");
> }
> Both these files are in the classpath. To be safe, I have also added the
> my-fs-site.xml in the constructor of MyFileSystem as
> conf.addResource("myfs-site.xml"), so that it is part of both the default
> resources as well as the non-default resources in the Configuration object.
> 4. I am trying to access the filesystem in my pig script as -
> A = LOAD 'myfs://myhost.com:8999/testdata' USING PigStorage(':') AS
> (name:chararray, age:int); -- loading data
> B = FOREACH A GENERATE name;
> store B into 'myfs://myhost.com:8999/testoutput';
> 5. The execution seems to start correctly, and MyFileSystem.class is
> invoked correctly. In MyFileSystem.class, I can also see that myfs-site.xml
> is loaded and the properties defined in it are available.
> 6. However, when Pig tries to submit the job, it cannot find these
> properties and the job fails to submit successfully.
> 7. If I move all the properties defined in myfs-site.xml to core-site.xml,
> the job gets submitted successfully, and it even succeeds. However, this is
> not ideal as I do not want to proliferate core-site.xml with all of the
> properties for a separate filesystem.
> 8. As I said earlier, upon taking a closer look at the pig code, I saw
> that while creating the JobConf object for a job, pig adds very specific
> resources to the job object, and ignores the resources that may have been
> added already (eg myfs-site.xml) in the Configuration object.
> 9. I have tested this with native map-reduce code as well as hive, and
> this approach of having a separate config file for MyFileSystem works fine
> in both those cases.
>
> So, to summarize, I am looking for a way to ask Pig to load parameters
> from my own config file before submitting a job.
>
> Thanks,
> -
> Bhooshan.
>
>
>
> On Fri, Apr 12, 2013 at 9:57 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:
>
>> +User group
>>
>> Hi Bhooshan,
>>
>> By default you should be running in MapReduce mode unless specified
>> otherwise. Are you creating a PigServer object to run your jobs? Can you
>> provide your code here?
>>
>> Sent from my iPhone
>>
>> On Apr 12, 2013, at 6:23 PM, Bhooshan Mogal <[EMAIL PROTECTED]>
>> wrote:
>>
>>  Apologies for the premature send. I may have some more information.
>> After I applied the patch and set "pig.use.overriden.hadoop.configs=true",
>> I saw an NPE (stacktrace below) and a message saying pig was running in
>> exectype local -
>>
>> 2013-04-13 07:37:13,758 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
>> to hadoop file system at: local
>> 2013-04-13 07:37:13,760 [main] WARN  org.apache.hadoop.conf.Configuration
>> - mapred.used.genericoptionsparser is deprecated. Instead, use
>> mapreduce.client.genericoptionsparser.used
>> 2013-04-13 07:37:14,162 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1200: Pig script failed to parse:
+
Bhooshan Mogal 2013-04-15, 23:37
+
Prashant Kommireddi 2013-04-15, 23:43
+
Bhooshan Mogal 2013-04-16, 00:34
+
Prashant Kommireddi 2013-04-16, 00:57
+
Bhooshan Mogal 2013-05-29, 23:55
+
Bhooshan Mogal 2013-06-21, 03:31