Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig + Cassandra Example !!


Copy link to this message
-
Re: Pig + Cassandra Example !!
Try something simple, in interactive mode, such as:

grunt> rows = LOAD 'cassandra://Keyspace1/Users' USING
org.apache.cassandra.hadoop.pig.CassandraStorage() AS (key, columns: bag
{T: tuple(name, value)});
grunt> cols = FOREACH rows GENERATE flatten(columns);
grunt> ILLUSTRATE cols;

Check that 'cols' is of correct form before preceding, and that data is
being accessed by pig through hadoop to Cassandra correctly.

-Dan

On Mon, Mar 18, 2013 at 12:20 PM, Mohammed Abdelkhalek <
[EMAIL PROTECTED]> wrote:

> How ?
>
>
> 2013/3/18 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
>
> > Try fully qualifying CassandraStorage() to org.apache.cassandra.hadoop.
> > pig.CassandraStorage().
> >
> > -Dan
> >
> > On Mon, Mar 18, 2013 at 11:56 AM, Mohammed Abdelkhalek <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Thank you for replying,
> > > In fact, i'm trying to run this script:
> > > grunt> rows = LOAD 'cassandra://Keyspace1/Users' USING
> CassandraStorage()
> > > AS (key, columns: bag {T: tuple(name, value)});
> > > grunt> cols = FOREACH rows GENERATE flatten(columns);
> > > grunt> colnames = FOREACH cols GENERATE $0;
> > > grunt> namegroups = GROUP colnames BY (chararray) $0;
> > > grunt> namecounts = FOREACH namegroups GENERATE COUNT($1), group;
> > > grunt> orderednames = ORDER namecounts BY $0;
> > > grunt> topnames = LIMIT orderednames 50;
> > > grunt> dump topnames;
> > >
> > > but i'm having this error:
> > > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve
> > > CassandraStorage using imports: [, org.apache.pig.builtin.,
> > > org.apache.pig.impl.builtin.]
> > >
> > >
> > >
> > >
> > > 2013/3/18 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
> > >
> > > > Storing to Cassandra requires a key->column->value data structure
> from
> > > pig.
> > > >  Here is one possible approach, requiring a udf to handle the pig
> > > > formatting interchange to cassandra:
> > > >
> > > > -- sample pig script
> > > > A = LOAD 'foo' USING PigStorage() AS (key:chararray, name:chararray,
> > > > value:chararray);
> > > > B = FOREACH A GENERATE
> > > >
> > > >
> > >
> >
> FLATTEN(com.to.udf.ToCassandraUDF(TOTUPLE('PotentialKeyManipulation/',$0,'/toSomethingElse'),
> > > > TOTUPLE($1), $2));
> > > > STORE B INTO 'cassandra://cassandraNamespace/myColumnFamilyName'
> USING
> > > > org.apache.cassandra.hadoop.pig.CassandraStorage();
> > > >
> > > > -- sample toCassandraUDF
> > > > package com.to.udf.ToCassandraUDF;
> > > >
> > > > public class ToCassandraUDF extends EvalFunc<Tuple> {
> > > >
> > > > public Tuple exec(Tuple input) throws IOException {
> > > > Tuple row = TupleFactory.getInstance().newTuple(2);
> > > > StringBuffer buf = null;
> > > >  Tuple keyParts = (Tuple) input.get(0);
> > > > buf = new StringBuffer();
> > > > for (Object o : keyParts.getAll()) {
> > > > if (o != null) {
> > > > buf.append(o.toString());
> > > > } else {
> > > > buf.append("null");
> > > > }
> > > > }
> > > > String key = buf.toString();
> > > >  Tuple columnParts = (Tuple) input.get(1);
> > > > buf = new StringBuffer();
> > > > for (Object o : columnParts.getAll()) {
> > > > if (o != null) {
> > > > buf.append(o.toString());
> > > > } else {
> > > > buf.append("null");
> > > > }
> > > > }
> > > > String columnName = buf.toString();
> > > >
> > > > byte[] columnValueBytes = null;
> > > > if (input.size() > 2) {
> > > > Object value = input.get(2);
> > > > columnValueBytes = value.toString().getBytes();
> > > > } else {
> > > > columnValueBytes = new byte[0];
> > > > }
> > > >
> > > > Tuple column = TupleFactory.getInstance().newTuple(2);
> > > > column.set(0, new DataByteArray(columnName.getBytes()));
> > > > column.set(1, new DataByteArray(columnValueBytes));
> > > >
> > > > DataBag bagOfColumns = BagFactory.getInstance().newDefaultBag();
> > > > bagOfColumns.add(column);
> > > >
> > > > row.set(0, key);
> > > > row.set(1, bagOfColumns);
> > > >
> > > > return row;
> > > > }
> > > >
> > > > public Schema outputSchema(Schema input) {
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB