Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Pig + Cassandra Example !!


Copy link to this message
-
Re: Pig + Cassandra Example !!
Dan DeCapria, CivicScienc... 2013-03-18, 17:24
So it appears that you need to configure Cassandra to run with hadoop.
 There are a couple of things you will need to do here.
In my case, I usually bootstrap these for my hadoop master and slaves, for
the correct dependencies and pig IP touch points for cassandra.

// install cassandra everywhere
echo "deb http://debian.datastax.com/community stable main" >
/tmp/cassandra.sources.list
sudo mv /tmp/cassandra.sources.list
/etc/apt/sources.list.d/cassandra.sources.list
curl -L http://debian.datastax.com/debian/repo_key | sudo apt-key add -
sudo apt-get update
sudo apt-get install -y cassandra
sudo /etc/init.d/cassandra stop
echo
"HADOOP_CLASSPATH=/usr/share/cassandra/*:/usr/share/cassandra/lib/*:$HADOOP_CLASSPATH"
>> /home/hadoop/conf/hadoop-user-env.sh
echo "PIG_INITIAL_ADDRESS=MYIPGOESHERE" >>
/home/hadoop/conf/hadoop-user-env.sh
echo "PIG_RPC_PORT=9160" >> /home/hadoop/conf/hadoop-user-env.sh
echo "PIG_ROOT_LOGGER=DEBUG,console" >> /home/hadoop/conf/hadoop-user-env.sh
echo "PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner" >>
/home/hadoop/conf/hadoop-user-env.sh
echo "PIG_WIDEROW_INPUT=true" >> /home/hadoop/conf/hadoop-user-env.sh

//

On Mon, Mar 18, 2013 at 12:57 PM, Mohammed Abdelkhalek <
[EMAIL PROTECTED]> wrote:

> I have an error in the first line of the code:
> *grunt> rows = LOAD 'cassandra://Keyspace1/Users' USING
> org.apache.cassandra.hadoop.*
> *pig.CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});*
>
> ERROR 1070: Could not resolve
> org.apache.cassandra.hadoop.pig.CassandraStorage using imports: [,
> org.apache.pig.builtin., org.apache.pig.impl.builtin.]
>
> Perhaps i should add some jars or something to reference to Cassandra
> Storage !!
>
>
> 2013/3/18 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
>
> > Try something simple, in interactive mode, such as:
> >
> > grunt> rows = LOAD 'cassandra://Keyspace1/Users' USING
> > org.apache.cassandra.hadoop.pig.CassandraStorage() AS (key, columns: bag
> > {T: tuple(name, value)});
> > grunt> cols = FOREACH rows GENERATE flatten(columns);
> > grunt> ILLUSTRATE cols;
> >
> > Check that 'cols' is of correct form before preceding, and that data is
> > being accessed by pig through hadoop to Cassandra correctly.
> >
> > -Dan
> >
> > On Mon, Mar 18, 2013 at 12:20 PM, Mohammed Abdelkhalek <
> > [EMAIL PROTECTED]> wrote:
> >
> > > How ?
> > >
> > >
> > > 2013/3/18 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
> > >
> > > > Try fully qualifying CassandraStorage() to
> org.apache.cassandra.hadoop.
> > > > pig.CassandraStorage().
> > > >
> > > > -Dan
> > > >
> > > > On Mon, Mar 18, 2013 at 11:56 AM, Mohammed Abdelkhalek <
> > > > [EMAIL PROTECTED]> wrote:
> > > >
> > > > > Thank you for replying,
> > > > > In fact, i'm trying to run this script:
> > > > > grunt> rows = LOAD 'cassandra://Keyspace1/Users' USING
> > > CassandraStorage()
> > > > > AS (key, columns: bag {T: tuple(name, value)});
> > > > > grunt> cols = FOREACH rows GENERATE flatten(columns);
> > > > > grunt> colnames = FOREACH cols GENERATE $0;
> > > > > grunt> namegroups = GROUP colnames BY (chararray) $0;
> > > > > grunt> namecounts = FOREACH namegroups GENERATE COUNT($1), group;
> > > > > grunt> orderednames = ORDER namecounts BY $0;
> > > > > grunt> topnames = LIMIT orderednames 50;
> > > > > grunt> dump topnames;
> > > > >
> > > > > but i'm having this error:
> > > > > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not
> > resolve
> > > > > CassandraStorage using imports: [, org.apache.pig.builtin.,
> > > > > org.apache.pig.impl.builtin.]
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2013/3/18 Dan DeCapria, CivicScience <
> [EMAIL PROTECTED]>
> > > > >
> > > > > > Storing to Cassandra requires a key->column->value data structure
> > > from
> > > > > pig.
> > > > > >  Here is one possible approach, requiring a udf to handle the pig
> > > > > > formatting interchange to cassandra:
> > > > > >
> > > > > > -- sample pig script

Dan DeCapria
CivicScience, Inc.
Senior Informatics / DM / ML / BI Specialist