Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig-cassandra Scritps and Oozie


Copy link to this message
-
Re: Pig-cassandra Scritps and Oozie
I believe what I did was when I set up Oozie with the setup script where you specify the version of Hadoop and such, I also added additional jars like the Cassandra jars and some of its dependencies there and the cassandra.yaml, cassandra-env.sh and potentially the topology properties file.  Then with the configuration outlined on the Cassandra wiki that you posted, I just used the built-in Pig support and it worked fine.  You might try a simple test case to read from and write to Cassandra and look for errors either in the job setup (the 1 mapper job that Oozie creates to initialize the job) or in the job itself.

The specific jars from Cassandra that I added as additional jars were:
cassandra-all
cassandra-thrift
guava
high-scale-lib
lib-thrift
log4j
snake-yaml
commons-io
then cassandra.yaml, cassandra-env.sh, and cassandra-topology.properties file (if using property file snitch)

I reference those jars in the environment variable LIBEXT_JARS then execute:
bin/oozie-setup.sh prepare-war -jars $LIBEXT_JARS -extjs ./ext-2.2.zip

Hopefully that helps,

Jeremy

On 28 Nov 2013, at 15:31, Miguel Angel Martin junquera <[EMAIL PROTECTED]> wrote:

> hi Jeremy,
>
> I do not try test it  still, I only test examples pig from oozie project
> without cassadra.
>
> * pig-cassandra* sets the cassandra pig libraries .jar in the the
> PIG_CLASSPATH env var. and after call the original shell script  *pig* from
> PIG_HOME/bin/pig and , up to now, I launch pig scripts with pig_cassandra
> directly.
>
> I do not know and did not  see how oozie launch pig and I supose that Oozie
> launch the PIG_HOME/bin/pig.
>
> If you are using  this config and the pig scripts that use cassandra works
> fine  , I suspose that the trick is  putting  the cassandra jars
> dependencies and other udf or libraries that you use in the pig scripts  in
> the oozie  sharelib or in the lib folder of the job.
>
>
> On the other hand, I do not know if  i have to configure some thing  like
> this.
>
> http://wiki.apache.org/cassandra/HadoopSupport#Oozie
>
> I am using Cassandra 1.2.10, Oozie 4.0.0 adn pig 0.11.1.
>
> I try to test these options and see if it works-
>
> Thanks in advance
>
>
>
>
>
>
>
>
>
>
>
> 2013/11/28 Jeremy Hanna <[EMAIL PROTECTED]>
>
>> If I remember correctly when I configured pig, cassandra, and oozie to
>> work together, I just used vanilla pig but gave it the jars it needed.
>>
>> What is the problem you’re experiencing that you are unable to do this?
>>
>> Jeremy
>>
>> On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera <
>> [EMAIL PROTECTED]> wrote:
>>
>>> hi all;
>>>
>>> What is the best way to integrate cassandra pig-extension with oozie?
>>>
>>> can be configure  oozie to use pig-cassandra instead of pig?
>>>
>>> Some ideas that I thinking are:
>>>
>>> Launching a Shell job    that runs ./pig-cassandra script.pig
>>> or   changing environment variables  vakues
>>> or the original to include the pig-cassandra code .... etc
>>>
>>> Thanks and regards
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB