Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> prep for cassandra storage from pig


+
William Oberman 2011-06-15, 18:17
Copy link to this message
-
Re: prep for cassandra storage from pig
Rather than staying stuck, I wrote a custom function: TupleToBagOfTuple. I'm
curios if I could have avoided this though.

On Wed, Jun 15, 2011 at 2:17 PM, William Oberman
<[EMAIL PROTECTED]>wrote:

> I think I'm stuck on typing issues trying to store data in cassandra.  To
> verify, cassandra wants (key, {tuples})
>
> My pig script is fairly brief:
> raw = LOAD 'cassandra://test_in/test_cf' USING CassandraStorage() AS
> (key:chararray, columns:bag {column:tuple (name, value)});
> --colums == timeUUID -> JSON
> rows = FOREACH raw GENERATE key, FLATTEN(columns);
> alias_target_day = FOREACH rows {
>     --I wrote a specialized parser that does exactly what I need
>     observation_map = com.civicscience.pig.ParseObservation($2);
>     GENERATE $0 as alias, observation_map#'_fqt' as target,
> observation_map#'_day' as day;
> };
> grouping = GROUP alias_target_day BY ((chararray)target,(chararray)day);
> X = FOREACH grouping GENERATE group.$0 as target, TOTUPLE(group.$1,
> COUNT($1)) as day_count;
>
> This gets me:
> (targetA, (day1, count))
> (targetA, (day2, count))
> (targetB, (day1, count))
> ....
>
> But, cassandra wants the 2nd item to be a bag.  So, I tried:
> X = FOREACH grouping GENERATE group.$0 as target, TOBAG(TOTUPLE(group.$1,
> COUNT($1))) as day_count;
>
> But this results in:
> (targetA, {((day1, count))})
> (targetA, {((day2, count))})
> (targetB, {((day1, count))})
> It's hard to see, but the 2nd item now has a nested tuple as the first
> value, which is still bad.
>
> How to I get (key, {tuple})???  I wasn't sure where to post this (pig or
> cassandra), so I'm posting to the pig list too.
>
> will
>

--
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) [EMAIL PROTECTED]
+
Jeremy Hanna 2011-06-15, 19:04
+
William Oberman 2011-06-15, 19:08
+
William Oberman 2011-06-15, 19:10
+
Jeremy Hanna 2011-06-15, 19:25
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB