Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Re: Why does Pig not use default resources from the Configuration object?


Copy link to this message
-
Re: Why does Pig not use default resources from the Configuration object?
Prashant Kommireddi 2013-04-16, 00:57
Pig actually does not add it core-site.xml or hadoop-site.xml explicitly,
it merely looks for these resources to be present on the classpath.

JobConf is the interface describing MR specifics to hadoop and pig uses it
to define jobs for execution. It loads up mapred*.xml. It does extend from
Configuration and uses the props loaded by it.
On Mon, Apr 15, 2013 at 5:34 PM, Bhooshan Mogal <[EMAIL PROTECTED]>wrote:

> Thanks! Quick question before starting this though. Since resources are
> added to the Configuration object in various classes in hadoop
> (Configuration.java adds core-*.xml, HDFSConfiguration.java adds
> hdfs-*.xml), why does Pig create a new JobConf object with selected
> resources before submitting a job and not reuse the Configuration object
> that may have been created earlier? Trying to understand why Pig adds
> core-site.xml, hdfs-site.xml, yarn-site.xml again.
>
>
> On Mon, Apr 15, 2013 at 4:43 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:
>
>> Sounds good. Here is a doc on contributing patch (for some pointers)
>> https://cwiki.apache.org/confluence/display/PIG/HowToContribute
>>
>>
>> On Mon, Apr 15, 2013 at 4:37 PM, Bhooshan Mogal <[EMAIL PROTECTED]
>> > wrote:
>>
>>> Hey Prashant,
>>>
>>> Yup, I can take a stab at it. This is the first time I am looking at Pig
>>> code, so I might take some time to get started. Will get back to you if I
>>> have questions in the meantime. And yes, I will write it so it reads a pig
>>> property.
>>>
>>> -
>>> Bhooshan.
>>>
>>>
>>> On Mon, Apr 15, 2013 at 11:58 AM, Prashant Kommireddi <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> Hi Bhooshan,
>>>>
>>>> This makes more sense now. I think overriding fs implementation should
>>>> go into core-site.xml, but it would be useful to be able to add
>>>> resources if you have a bunch of other properties.
>>>>
>>>> Would you like to submit a patch? It should be based on a pig property
>>>> that suggests the additional resource names (myfs-site.xml) in your case.
>>>>
>>>> -Prashant
>>>>
>>>>
>>>> On Mon, Apr 15, 2013 at 10:35 AM, Bhooshan Mogal <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> Hi Prashant,
>>>>>
>>>>>
>>>>> Yes, I am running in MapReduce mode. Let me give you the steps in the
>>>>> scenario that I am trying to test -
>>>>>
>>>>> 1. I have my own implementation of org.apache.hadoop.fs.FileSystem for
>>>>> a filesystem I am trying to implement - Let's call it MyFileSystem.class.
>>>>> This filesystem uses the scheme myfs:// for its URIs
>>>>> 2. I have set fs.myfs.impl to MyFileSystem.class in core-site.xml and
>>>>> made the class available through a jar file that is part of
>>>>> HADOOP_CLASSPATH (or PIG_CLASSPATH).
>>>>> 3. In MyFileSystem.class, I have a static block as -
>>>>> static {
>>>>>     Configuration.addDefaultResource("myfs-default.xml");
>>>>>     Configuration.addDefaultResource("myfs-site.xml");
>>>>> }
>>>>> Both these files are in the classpath. To be safe, I have also added
>>>>> the my-fs-site.xml in the constructor of MyFileSystem as
>>>>> conf.addResource("myfs-site.xml"), so that it is part of both the default
>>>>> resources as well as the non-default resources in the Configuration object.
>>>>> 4. I am trying to access the filesystem in my pig script as -
>>>>> A = LOAD 'myfs://myhost.com:8999/testdata' USING PigStorage(':') AS
>>>>> (name:chararray, age:int); -- loading data
>>>>> B = FOREACH A GENERATE name;
>>>>> store B into 'myfs://myhost.com:8999/testoutput';
>>>>> 5. The execution seems to start correctly, and MyFileSystem.class is
>>>>> invoked correctly. In MyFileSystem.class, I can also see that myfs-site.xml
>>>>> is loaded and the properties defined in it are available.
>>>>> 6. However, when Pig tries to submit the job, it cannot find these
>>>>> properties and the job fails to submit successfully.
>>>>> 7. If I move all the properties defined in myfs-site.xml to
>>>>> core-site.xml, the job gets submitted successfully, and it even succeeds.
>>>>> However, this is not ideal as I do not want to proliferate core-site.xml