Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> flattening a map generated by Pigmix


Copy link to this message
-
flattening a map generated by Pigmix
Hi,

Pigmix generates a map called page_info (see of email for links) which I am
flattening in a script as follow:

register pigperf.jar;
A1 = load '/data/pigmix/page_views' using
 org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue,
page_info, page_links);
A = foreach A1 generate user, RANDOM() as rval:double, action, timespent,
query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links;
B = foreach A generate user, rval, action, timespent, query_term, ip_addr,
timestamp, estimated_revenue;
*C = foreach A generate user, rval, flatten((map[])page_info);*
D = foreach A generate user, rval, flatten((bag{tuple(map[])})page_links);
store B into 'PV_master1' using PigStorage('\t');
store C into 'PV_page_info1' using PigStorage('\t');
store D into 'PV_page_links1' using PigStorage('\t');

I get an error while flattening the map as follows:
2013-12-22 01:17:21,554 [pool-1-thread-1] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map
- Aliases being processed per job phase (AliasName[line,offset]): M:
A1[3,5],A[7
,4],B[8,4],C[9,4],D[10,4] C:  R:
2013-12-22 01:17:21,594 [Thread-3] INFO
 org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2013-12-22 01:17:21,596 [Thread-3] WARN
 org.apache.hadoop.mapred.LocalJobRunner - job_local909428065_0001
*java.lang.Exception: org.apache.pig.backend.executionengine.ExecException:
ERROR 0: Exception while executing (Name: C:
Store(file:///media/work/EDBT/data/tony_s/PV_page_info1:PigStorage(' ')) -
scope-*
*52 Operator Key: scope-52):
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception
while executing [POCast (Name: Cast[map:[]] - scope-49 Operator Key:
scope-49) children: [[POProject (N*
*ame: Project[bytearray][8] - scope-48 Operator Key: scope-48) children:
null at []]] at []]: java.lang.UnsupportedOperationException*
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0:
Exception while executing (Name: C:
Store(file:///media/work/EDBT/data/tony_s/PV_page_info1:PigStorage(' ')) -
scope-52 Opera
tor Key: scope-52): org.apache.pig.backend.executionengine.ExecException:
ERROR 0: Exception while executing [POCast (Name: Cast[map:[]] - scope-49
Operator Key: scope-49) children: [[POProject (Name: Pro
ject[bytearray][8] - scope-48 Operator Key: scope-48) children: null at
[]]] at []]: java.lang.UnsupportedOperationException
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)

The pigmix page says there is no null values generated for page_info (see
table below). I also looked into the data, but the random generator doesnt
output pretty prints :)

Any ideas what this cast error means?

Thanks,
Keren

Pigmix: https://cwiki.apache.org/confluence/display/PIG/PigMix
DataGenerator: http://wiki.apache.org/pig/DataGeneratorHadoop
 Name

Type

Average Length

Cardinality

Distribution

Percent Null

user

string

20

1.6M

zipf

7

timestamp

long

X

86400

uniform

0

timespent

int

X

20

zipf

0

query_term

string

10

1.8M

zipf

20

page_links

bag of maps

50

X

zipf

20

 *page_info*

*map*

*15*

*X*

*zipf*

*0*

ip_addr

long

X

1M

zipf

0

estimated_revenue

double

X

100k

zipf

5

action

int

X

2

uniform

0

--
Keren Ouaknine
www.kereno.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB