Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Re: Why does Pig not use default resources from the Configuration object?


+
Prashant Kommireddi 2013-04-13, 04:57
+
Bhooshan Mogal 2013-04-15, 17:35
Copy link to this message
-
Re: Why does Pig not use default resources from the Configuration object?
Hi Bhooshan,

This makes more sense now. I think overriding fs implementation should go
into core-site.xml, but it would be useful to be able to add resources if
you have a bunch of other properties.

Would you like to submit a patch? It should be based on a pig property that
suggests the additional resource names (myfs-site.xml) in your case.

-Prashant
On Mon, Apr 15, 2013 at 10:35 AM, Bhooshan Mogal
<[EMAIL PROTECTED]>wrote:

> Hi Prashant,
>
>
> Yes, I am running in MapReduce mode. Let me give you the steps in the
> scenario that I am trying to test -
>
> 1. I have my own implementation of org.apache.hadoop.fs.FileSystem for a
> filesystem I am trying to implement - Let's call it MyFileSystem.class.
> This filesystem uses the scheme myfs:// for its URIs
> 2. I have set fs.myfs.impl to MyFileSystem.class in core-site.xml and made
> the class available through a jar file that is part of HADOOP_CLASSPATH (or
> PIG_CLASSPATH).
> 3. In MyFileSystem.class, I have a static block as -
> static {
>     Configuration.addDefaultResource("myfs-default.xml");
>     Configuration.addDefaultResource("myfs-site.xml");
> }
> Both these files are in the classpath. To be safe, I have also added the
> my-fs-site.xml in the constructor of MyFileSystem as
> conf.addResource("myfs-site.xml"), so that it is part of both the default
> resources as well as the non-default resources in the Configuration object.
> 4. I am trying to access the filesystem in my pig script as -
> A = LOAD 'myfs://myhost.com:8999/testdata' USING PigStorage(':') AS
> (name:chararray, age:int); -- loading data
> B = FOREACH A GENERATE name;
> store B into 'myfs://myhost.com:8999/testoutput';
> 5. The execution seems to start correctly, and MyFileSystem.class is
> invoked correctly. In MyFileSystem.class, I can also see that myfs-site.xml
> is loaded and the properties defined in it are available.
> 6. However, when Pig tries to submit the job, it cannot find these
> properties and the job fails to submit successfully.
> 7. If I move all the properties defined in myfs-site.xml to core-site.xml,
> the job gets submitted successfully, and it even succeeds. However, this is
> not ideal as I do not want to proliferate core-site.xml with all of the
> properties for a separate filesystem.
> 8. As I said earlier, upon taking a closer look at the pig code, I saw
> that while creating the JobConf object for a job, pig adds very specific
> resources to the job object, and ignores the resources that may have been
> added already (eg myfs-site.xml) in the Configuration object.
> 9. I have tested this with native map-reduce code as well as hive, and
> this approach of having a separate config file for MyFileSystem works fine
> in both those cases.
>
> So, to summarize, I am looking for a way to ask Pig to load parameters
> from my own config file before submitting a job.
>
> Thanks,
> -
> Bhooshan.
>
>
>
> On Fri, Apr 12, 2013 at 9:57 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:
>
>> +User group
>>
>> Hi Bhooshan,
>>
>> By default you should be running in MapReduce mode unless specified
>> otherwise. Are you creating a PigServer object to run your jobs? Can you
>> provide your code here?
>>
>> Sent from my iPhone
>>
>> On Apr 12, 2013, at 6:23 PM, Bhooshan Mogal <[EMAIL PROTECTED]>
>> wrote:
>>
>>  Apologies for the premature send. I may have some more information.
>> After I applied the patch and set "pig.use.overriden.hadoop.configs=true",
>> I saw an NPE (stacktrace below) and a message saying pig was running in
>> exectype local -
>>
>> 2013-04-13 07:37:13,758 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
>> to hadoop file system at: local
>> 2013-04-13 07:37:13,760 [main] WARN  org.apache.hadoop.conf.Configuration
>> - mapred.used.genericoptionsparser is deprecated. Instead, use
>> mapreduce.client.genericoptionsparser.used
>> 2013-04-13 07:37:14,162 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1200: Pig script failed to parse:
+
Bhooshan Mogal 2013-04-15, 23:37
+
Prashant Kommireddi 2013-04-15, 23:43
+
Bhooshan Mogal 2013-04-16, 00:34
+
Prashant Kommireddi 2013-04-16, 00:57
+
Bhooshan Mogal 2013-05-29, 23:55
+
Bhooshan Mogal 2013-06-21, 03:31
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB