Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Pig-cassandra Scritps and Oozie


Copy link to this message
-
Re: Pig-cassandra Scritps and Oozie
Jeremy Hanna 2013-11-28, 15:59
I believe what I did was when I set up Oozie with the setup script where you specify the version of Hadoop and such, I also added additional jars like the Cassandra jars and some of its dependencies there and the cassandra.yaml, cassandra-env.sh and potentially the topology properties file.  Then with the configuration outlined on the Cassandra wiki that you posted, I just used the built-in Pig support and it worked fine.  You might try a simple test case to read from and write to Cassandra and look for errors either in the job setup (the 1 mapper job that Oozie creates to initialize the job) or in the job itself.

The specific jars from Cassandra that I added as additional jars were:
cassandra-all
cassandra-thrift
guava
high-scale-lib
lib-thrift
log4j
snake-yaml
commons-io
then cassandra.yaml, cassandra-env.sh, and cassandra-topology.properties file (if using property file snitch)

I reference those jars in the environment variable LIBEXT_JARS then execute:
bin/oozie-setup.sh prepare-war -jars $LIBEXT_JARS -extjs ./ext-2.2.zip

Hopefully that helps,

Jeremy

On 28 Nov 2013, at 15:31, Miguel Angel Martin junquera <[EMAIL PROTECTED]> wrote:

> hi Jeremy,
>
> I do not try test it  still, I only test examples pig from oozie project
> without cassadra.
>
> * pig-cassandra* sets the cassandra pig libraries .jar in the the
> PIG_CLASSPATH env var. and after call the original shell script  *pig* from
> PIG_HOME/bin/pig and , up to now, I launch pig scripts with pig_cassandra
> directly.
>
> I do not know and did not  see how oozie launch pig and I supose that Oozie
> launch the PIG_HOME/bin/pig.
>
> If you are using  this config and the pig scripts that use cassandra works
> fine  , I suspose that the trick is  putting  the cassandra jars
> dependencies and other udf or libraries that you use in the pig scripts  in
> the oozie  sharelib or in the lib folder of the job.
>
>
> On the other hand, I do not know if  i have to configure some thing  like
> this.
>
> http://wiki.apache.org/cassandra/HadoopSupport#Oozie
>
> I am using Cassandra 1.2.10, Oozie 4.0.0 adn pig 0.11.1.
>
> I try to test these options and see if it works-
>
> Thanks in advance
>
>
>
>
>
>
>
>
>
>
>
> 2013/11/28 Jeremy Hanna <[EMAIL PROTECTED]>
>
>> If I remember correctly when I configured pig, cassandra, and oozie to
>> work together, I just used vanilla pig but gave it the jars it needed.
>>
>> What is the problem you’re experiencing that you are unable to do this?
>>
>> Jeremy
>>
>> On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera <
>> [EMAIL PROTECTED]> wrote:
>>
>>> hi all;
>>>
>>> What is the best way to integrate cassandra pig-extension with oozie?
>>>
>>> can be configure  oozie to use pig-cassandra instead of pig?
>>>
>>> Some ideas that I thinking are:
>>>
>>> Launching a Shell job    that runs ./pig-cassandra script.pig
>>> or   changing environment variables  vakues
>>> or the original to include the pig-cassandra code .... etc
>>>
>>> Thanks and regards
>>
>>