Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig + Cassandra Example !!


Copy link to this message
-
Re: Pig + Cassandra Example !!
So it appears that you need to configure Cassandra to run with hadoop.
 There are a couple of things you will need to do here.
In my case, I usually bootstrap these for my hadoop master and slaves, for
the correct dependencies and pig IP touch points for cassandra.

// install cassandra everywhere
echo "deb http://debian.datastax.com/community stable main" >
/tmp/cassandra.sources.list
sudo mv /tmp/cassandra.sources.list
/etc/apt/sources.list.d/cassandra.sources.list
curl -L http://debian.datastax.com/debian/repo_key | sudo apt-key add -
sudo apt-get update
sudo apt-get install -y cassandra
sudo /etc/init.d/cassandra stop
echo
"HADOOP_CLASSPATH=/usr/share/cassandra/*:/usr/share/cassandra/lib/*:$HADOOP_CLASSPATH"
>> /home/hadoop/conf/hadoop-user-env.sh
echo "PIG_INITIAL_ADDRESS=MYIPGOESHERE" >>
/home/hadoop/conf/hadoop-user-env.sh
echo "PIG_RPC_PORT=9160" >> /home/hadoop/conf/hadoop-user-env.sh
echo "PIG_ROOT_LOGGER=DEBUG,console" >> /home/hadoop/conf/hadoop-user-env.sh
echo "PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner" >>
/home/hadoop/conf/hadoop-user-env.sh
echo "PIG_WIDEROW_INPUT=true" >> /home/hadoop/conf/hadoop-user-env.sh

//

On Mon, Mar 18, 2013 at 12:57 PM, Mohammed Abdelkhalek <
[EMAIL PROTECTED]> wrote:

> I have an error in the first line of the code:
> *grunt> rows = LOAD 'cassandra://Keyspace1/Users' USING
> org.apache.cassandra.hadoop.*
> *pig.CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});*
>
> ERROR 1070: Could not resolve
> org.apache.cassandra.hadoop.pig.CassandraStorage using imports: [,
> org.apache.pig.builtin., org.apache.pig.impl.builtin.]
>
> Perhaps i should add some jars or something to reference to Cassandra
> Storage !!
>
>
> 2013/3/18 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
>
> > Try something simple, in interactive mode, such as:
> >
> > grunt> rows = LOAD 'cassandra://Keyspace1/Users' USING
> > org.apache.cassandra.hadoop.pig.CassandraStorage() AS (key, columns: bag
> > {T: tuple(name, value)});
> > grunt> cols = FOREACH rows GENERATE flatten(columns);
> > grunt> ILLUSTRATE cols;
> >
> > Check that 'cols' is of correct form before preceding, and that data is
> > being accessed by pig through hadoop to Cassandra correctly.
> >
> > -Dan
> >
> > On Mon, Mar 18, 2013 at 12:20 PM, Mohammed Abdelkhalek <
> > [EMAIL PROTECTED]> wrote:
> >
> > > How ?
> > >
> > >
> > > 2013/3/18 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
> > >
> > > > Try fully qualifying CassandraStorage() to
> org.apache.cassandra.hadoop.
> > > > pig.CassandraStorage().
> > > >
> > > > -Dan
> > > >
> > > > On Mon, Mar 18, 2013 at 11:56 AM, Mohammed Abdelkhalek <
> > > > [EMAIL PROTECTED]> wrote:
> > > >
> > > > > Thank you for replying,
> > > > > In fact, i'm trying to run this script:
> > > > > grunt> rows = LOAD 'cassandra://Keyspace1/Users' USING
> > > CassandraStorage()
> > > > > AS (key, columns: bag {T: tuple(name, value)});
> > > > > grunt> cols = FOREACH rows GENERATE flatten(columns);
> > > > > grunt> colnames = FOREACH cols GENERATE $0;
> > > > > grunt> namegroups = GROUP colnames BY (chararray) $0;
> > > > > grunt> namecounts = FOREACH namegroups GENERATE COUNT($1), group;
> > > > > grunt> orderednames = ORDER namecounts BY $0;
> > > > > grunt> topnames = LIMIT orderednames 50;
> > > > > grunt> dump topnames;
> > > > >
> > > > > but i'm having this error:
> > > > > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not
> > resolve
> > > > > CassandraStorage using imports: [, org.apache.pig.builtin.,
> > > > > org.apache.pig.impl.builtin.]
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2013/3/18 Dan DeCapria, CivicScience <
> [EMAIL PROTECTED]>
> > > > >
> > > > > > Storing to Cassandra requires a key->column->value data structure
> > > from
> > > > > pig.
> > > > > >  Here is one possible approach, requiring a udf to handle the pig
> > > > > > formatting interchange to cassandra:
> > > > > >
> > > > > > -- sample pig script

Dan DeCapria
CivicScience, Inc.
Senior Informatics / DM / ML / BI Specialist
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB