Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> SUM


That changes things entirely. There's some weirdness in the way data is
read from Cassandra. Have you applied the latest patches (eg.
https://issues.apache.org/jira/browse/CASSANDRA-2387) ?

See also some UDFs for working with Cassandra data that Jeremy Hanna
(@jeromatron) wrote:

https://github.com/jeromatron/pygmalion
Best of luck!

--jacob
@thedatachef

On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> Maybe I forget one more thing, rows are taken from Cassandra.
>
> rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
>
> I have no idea how to format AS for bag in foreach.
>
>
> P.
>
> 2011/4/24 Jacob Perkins <[EMAIL PROTECTED]>
>
> > Strange, that looks right to me. What happens if you try the 'AS'
> > statement anyhow?
> >
> > --jacob
> > @thedatachef
> >
> > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > Hello,
> > >
> > > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > > describe pom
> > > pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
> > >
> > > data = foreach pom generate flatten($0);
> > > grunt> describe data;
> > > data: {y::domain: chararray,y::spam: int,y::size: long,y::time: float}
> > >
> > >
> > > I thing they are casted fine, right?
> > >
> > > UDF is python one with decorator
> > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
> > > time:float)}")
> > >
> > > Thanks
> > >
> > >
> > >
> > > 2011/4/24 Jacob Perkins <[EMAIL PROTECTED]>
> > >
> > > > You're getting a 'ClassCastException' because the contents of the bags
> > > > are DataByteArray and not long (or cannot be cast to long). I suspect
> > > > that you're generating the contents of the bag in some way from a UDF,
> > > > no?
> > > >
> > > > You need to either declare the output schema explicitly in the UDF or
> > > > just use the 'AS' statement. For example, say you have a UDF that sums
> > > > two numbers:
> > > >
> > > > data   = LOAD 'foobar' AS (int:a, int:b);
> > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
> > > > DUMP summed;
> > > >
> > > > --jacob
> > > > @thedatachef
> > > >
> > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > x = foreach g2 generate group, data.(size);
> > > > > dump x;
> > > > >
> > > > > ((drm,0),{(464868)})
> > > > > ((drm,1),{(464868)})
> > > > > ((snezz,0),{(8073),(8073)})
> > > > >
> > > > > but:
> > > > > x = foreach g2 generate group, SUM(data.size);
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > > org.apache.pig.backend.executionengine.ExecException: ERROR 2106:
> > Error
> > > > > while computing sum in Initial
> > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB