Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - flattening a map generated by Pigmix


Copy link to this message
-
flattening a map generated by Pigmix
Keren Ouaknine 2013-12-22, 10:36
Hi,

Pigmix generates a map called page_info (see of email for links) which I am
flattening in a script as follow:

register pigperf.jar;
A1 = load '/data/pigmix/page_views' using
 org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue,
page_info, page_links);
A = foreach A1 generate user, RANDOM() as rval:double, action, timespent,
query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links;
B = foreach A generate user, rval, action, timespent, query_term, ip_addr,
timestamp, estimated_revenue;
*C = foreach A generate user, rval, flatten((map[])page_info);*
D = foreach A generate user, rval, flatten((bag{tuple(map[])})page_links);
store B into 'PV_master1' using PigStorage('\t');
store C into 'PV_page_info1' using PigStorage('\t');
store D into 'PV_page_links1' using PigStorage('\t');

I get an error while flattening the map as follows:
2013-12-22 01:17:21,554 [pool-1-thread-1] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map
- Aliases being processed per job phase (AliasName[line,offset]): M:
A1[3,5],A[7
,4],B[8,4],C[9,4],D[10,4] C:  R:
2013-12-22 01:17:21,594 [Thread-3] INFO
 org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2013-12-22 01:17:21,596 [Thread-3] WARN
 org.apache.hadoop.mapred.LocalJobRunner - job_local909428065_0001
*java.lang.Exception: org.apache.pig.backend.executionengine.ExecException:
ERROR 0: Exception while executing (Name: C:
Store(file:///media/work/EDBT/data/tony_s/PV_page_info1:PigStorage(' ')) -
scope-*
*52 Operator Key: scope-52):
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception
while executing [POCast (Name: Cast[map:[]] - scope-49 Operator Key:
scope-49) children: [[POProject (N*
*ame: Project[bytearray][8] - scope-48 Operator Key: scope-48) children:
null at []]] at []]: java.lang.UnsupportedOperationException*
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0:
Exception while executing (Name: C:
Store(file:///media/work/EDBT/data/tony_s/PV_page_info1:PigStorage(' ')) -
scope-52 Opera
tor Key: scope-52): org.apache.pig.backend.executionengine.ExecException:
ERROR 0: Exception while executing [POCast (Name: Cast[map:[]] - scope-49
Operator Key: scope-49) children: [[POProject (Name: Pro
ject[bytearray][8] - scope-48 Operator Key: scope-48) children: null at
[]]] at []]: java.lang.UnsupportedOperationException
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)

The pigmix page says there is no null values generated for page_info (see
table below). I also looked into the data, but the random generator doesnt
output pretty prints :)

Any ideas what this cast error means?

Thanks,
Keren

Pigmix: https://cwiki.apache.org/confluence/display/PIG/PigMix
DataGenerator: http://wiki.apache.org/pig/DataGeneratorHadoop
 Name

Type

Average Length

Cardinality

Distribution

Percent Null

user

string

20

1.6M

zipf

7

timestamp

long

X

86400

uniform

0

timespent

int

X

20

zipf

0

query_term

string

10

1.8M

zipf

20

page_links

bag of maps

50

X

zipf

20

 *page_info*

*map*

*15*

*X*

*zipf*

*0*

ip_addr

long

X

1M

zipf

0

estimated_revenue

double

X

100k

zipf

5

action

int

X

2

uniform

0

--
Keren Ouaknine
www.kereno.com