Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Error When Sorting


Copy link to this message
-
Re: Error When Sorting
For some reason pig fails to find the samples files created in the sampling MR job of the order-by.
You seem to be running in local mode, is this error seen in map-reduce mode as well?
-Thejas

On 3/11/11 8:35 AM, "Keric Donnelly" <[EMAIL PROTECTED]> wrote:

I've been playing with pig this week and I'm running into an issue that
seems like it should be trivial. I'm basically reading data from hbase and
and performing a count of sessions associated with a cookie.

I'm running on Pig 0.8

My script looks like the following

raw = LOAD 'hbase://sport_user'

      USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(

      'session:*', '-loadKey true')

      AS (id:bytearray, session_map:map[]);

 -- Convert maps to bags

B = FOREACH raw GENERATE id, mapToBag(session_map) AS session_bag;

--dump B;
-- Count the number of session

C = FOREACH B GENERATE id,

        COUNT(session_bag) as sess_count;

describe C ;

dump C ;

This works fine. when I dump "C" I see the cg cookie and num of sessions.
For Example
(ANON_Cg+5EUka4wFOAAAAtRg,2)

(ANON_Cg+5EUknSmmLAAAA5CU,1)

(ANON_Cg+5EUlHWwwNAAAALQQ,1)

(ANON_Cg+5EUlSDOIJAAAAygw,1)

(ANON_Cg+5EUlgDESHAAAAWQ0,1)

(ANON_Cg+5EUli1UHBAAAA/xg,4)

(ANON_Cg+5EUmSc3sPAAAAsg4,2)

(ANON_Cg+5EUmo6i8PAAAAwxo,2)

(ANON_Cg+5EUn2X6HOAAAAWSM,1)

(ANON_Cg+5EUn5PmRCAQAA1xA,4)

(ANON_Cg+5EUnUT9+NAAAA0RE,3)

(ANON_Cg+5EUnjSD0BAAAACx0,1)

(ANON_Cg+5EUoJF82PAAAAkgI,1)

(ANON_Cg+5EUoWJW9GAAAAcx4,1)

(ANON_Cg+5EUorklmHAAAAxRk,1)

(ANON_Cg+5EUp1bXGFAAAAPwA,1)

(ANON_Cg+5EUp55I5OAAAAmR4,2)

(ANON_Cg+5EUp9XkHFAAAAYQ8,2)

(ANON_Cg+5EUpK/koEAAAAcRs,3)

(ANON_Cg+5EUpd/aDJAAAABBw,3)
If I then do a desc sort on the alias "C" I get an error when I dump it
D = ORDER C BY sess_count DESC ;
dump D ;
2011-03-10 16:10:59,325 [Thread-57] WARN
 org.apache.hadoop.mapred.LocalJobRunner - job_local_0004

java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does
not exist:
file:/Users/keric/Documents/workspace/_Java/cnwk-hadoop/pigsample_368958259_1299791458629

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:139)

at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)

at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)

at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist:
file:/Users/keric/Documents/workspace/_Java/cnwk-hadoop/pigsample_368958259_1299791458629

at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)

at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)

at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)

at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:112)

... 6 more

any thoughts ?
Thanks
Keric
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB