Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - how can i get the column value? Need help!.. cassandra 1.28 and pig 0.11.1


Copy link to this message
-
how can i get the column value? Need help!.. cassandra 1.28 and pig 0.11.1
Miguel Angel Martin junqu... 2013-08-22, 10:51
hi all:
I,m testing the new CqlStorage() with cassandra 1.28 and pig 0.11.1
I am using this sample data test:
http://frommyworkshop.blogspot.com.es/2013/07/hadoop-map-reduce-with-cassandra.html

And I load and dump data Righ with this script:

*rows = LOAD
'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING
CqlStorage();*
*
*
*dump rows;*
*describe rows;*
*
*

*resutls:

((id,6),(age,30),(title,QA))

((id,5),(age,30),(title,QA))

rows: {id: chararray,age: int,title: chararray}
*
But i can not  get  the column values

I try to define   another schemas in Load like I used with
cassandraStorage()

http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-Pig-how-to-get-column-values-td5641158.html
example:

*rows = LOAD
'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING
CqlStorage() AS (columns: bag {T: tuple(name, value)});*
and I get this error:

*2013-08-22 12:24:45,426 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1031: Incompatable schema: left is
"columns:bag{T:tuple(name:bytearray,value:bytearray)}", right is
"id:chararray,age:int,title:chararray"*
I try to use, FLATTEN, SUBSTRING, SPLIT UDF`s but i have not get good
result:

Example:
   - when I flatten , I get a set of tuples like

*(title,QA)*

*(title,QA)*

*2013-08-22 12:42:20,673 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1*

*A: {title: chararray}*

but i can get value QA

Sustring only works with title

example:

*B = FOREACH A GENERATE SUBSTRING(title,2,5);*
*
*
*dump B;*
*describe B;*
*
*
*
*

*results:*
*
*

*(tle)*
*(tle)*
*B: {chararray}*
i try, this like ERIC LEE inthe other mail  and have the same results:
 Anyways, what I really what is the column value, not the name. Is there a
way to do that? I listed all of the failed attempts I made below.

   - colnames = FOREACH cols GENERATE $1 and was told $1 was out of bounds.
   - casted = FOREACH cols GENERATE (tuple(chararray, chararray))$0; but
   all I got back were empty tuples
   - values = FOREACH cols GENERATE $0.$1; but I got an error telling me
   data byte array can't be casted to tuple
Please, I will appreciate any help
Regards
+
Miguel Angel Martin junqu... 2013-08-28, 06:42
+
Miguel Angel Martin junqu... 2013-08-28, 08:02
+
Miguel Angel Martin junqu... 2013-08-30, 08:01
+
Miguel Angel Martin junqu... 2013-09-02, 11:10
+
Miguel Angel Martin junqu... 2013-09-02, 13:09
+
Cyril Scetbon 2013-09-19, 20:28
+
Cyril Scetbon 2013-09-23, 16:12