Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Error When Sorting


+
Keric Donnelly 2011-03-11, 16:35
Copy link to this message
-
Re: Error When Sorting
Thejas M Nair 2011-03-11, 21:44
For some reason pig fails to find the samples files created in the sampling MR job of the order-by.
You seem to be running in local mode, is this error seen in map-reduce mode as well?
-Thejas

On 3/11/11 8:35 AM, "Keric Donnelly" <[EMAIL PROTECTED]> wrote:

I've been playing with pig this week and I'm running into an issue that
seems like it should be trivial. I'm basically reading data from hbase and
and performing a count of sessions associated with a cookie.

I'm running on Pig 0.8

My script looks like the following

raw = LOAD 'hbase://sport_user'

      USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(

      'session:*', '-loadKey true')

      AS (id:bytearray, session_map:map[]);

 -- Convert maps to bags

B = FOREACH raw GENERATE id, mapToBag(session_map) AS session_bag;

--dump B;
-- Count the number of session

C = FOREACH B GENERATE id,

        COUNT(session_bag) as sess_count;

describe C ;

dump C ;

This works fine. when I dump "C" I see the cg cookie and num of sessions.
For Example
(ANON_Cg+5EUka4wFOAAAAtRg,2)

(ANON_Cg+5EUknSmmLAAAA5CU,1)

(ANON_Cg+5EUlHWwwNAAAALQQ,1)

(ANON_Cg+5EUlSDOIJAAAAygw,1)

(ANON_Cg+5EUlgDESHAAAAWQ0,1)

(ANON_Cg+5EUli1UHBAAAA/xg,4)

(ANON_Cg+5EUmSc3sPAAAAsg4,2)

(ANON_Cg+5EUmo6i8PAAAAwxo,2)

(ANON_Cg+5EUn2X6HOAAAAWSM,1)

(ANON_Cg+5EUn5PmRCAQAA1xA,4)

(ANON_Cg+5EUnUT9+NAAAA0RE,3)

(ANON_Cg+5EUnjSD0BAAAACx0,1)

(ANON_Cg+5EUoJF82PAAAAkgI,1)

(ANON_Cg+5EUoWJW9GAAAAcx4,1)

(ANON_Cg+5EUorklmHAAAAxRk,1)

(ANON_Cg+5EUp1bXGFAAAAPwA,1)

(ANON_Cg+5EUp55I5OAAAAmR4,2)

(ANON_Cg+5EUp9XkHFAAAAYQ8,2)

(ANON_Cg+5EUpK/koEAAAAcRs,3)

(ANON_Cg+5EUpd/aDJAAAABBw,3)
If I then do a desc sort on the alias "C" I get an error when I dump it
D = ORDER C BY sess_count DESC ;
dump D ;
2011-03-10 16:10:59,325 [Thread-57] WARN
 org.apache.hadoop.mapred.LocalJobRunner - job_local_0004

java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does
not exist:
file:/Users/keric/Documents/workspace/_Java/cnwk-hadoop/pigsample_368958259_1299791458629

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:139)

at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)

at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)

at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist:
file:/Users/keric/Documents/workspace/_Java/cnwk-hadoop/pigsample_368958259_1299791458629

at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)

at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)

at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)

at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:112)

... 6 more

any thoughts ?
Thanks
Keric
+
Keric Donnelly 2011-03-14, 13:16