Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> strange problem with count and distinct subscribers


Copy link to this message
-
Re: strange problem with count and distinct subscribers
Which version are you using? I am wondering whether PIG-3466 fixes your
error-
https://issues.apache.org/jira/browse/PIG-3466

You can reproduce the error only when loading more data. You also see a
random type cast error. My guess is that you ran into the race condition
that PIG-3466 fixed, and your bag is corrupted resulting in the type cast
error.

On Mon, Nov 18, 2013 at 6:30 AM, Noam Lavie <[EMAIL PROTECTED]> wrote:

> Hi,
> I'm trying to run the following pig script (it main purpose is to read
> inputs that contains info about phone calls, the script suppose to count
> the different types of calls and the different subscribers that made them):
>
> SET default_parallel 40;
> allFiles = LOAD
> 'maprfs:///analytics/data/consumers/mapred/facts/done/FACT_VOICE_GE_Analytics9_1/20131114/'
> USING PigStorage(',');
> allFilesFiltered = FILTER allFiles BY $11 MATCHES '.*On.*' AND $4 > 0;
> datesList = FOREACH allFilesFiltered GENERATE SUBSTRING($0, 0, 10) AS day,
> $11 AS callType, $4 AS amount, $1 AS subscriberKey;
> datesGroups = GROUP datesList BY (day, callType);
> datesGroupsAmount = foreach datesGroups {
>     unique_seubscriber = DISTINCT datesList.subscriberKey;
>     GENERATE group.day, group.callType, COUNT(datesList),
> SUM(datesList.amount), COUNT(unique_seubscriber);
> };
> dump datesGroupsAmount;
>
> the problem is with the  unique_seubscriber. The count and distinct
> doesn't work. The strange thing is that if I run script separately for each
> sub folder's input  - the run will succeed for each part, but if I'm giving
> the hall  inputs folders together it fails and I get the following error:
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open
> iterator for alias datesGroupsAmount
>
> Another error that I get from time to time (if I'm making small changes in
> the script) is:
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open
> iterator for alias datesGroupsAmount. Backend error : java.lang.Boolean
> cannot be cast to org.apache.pig.data.Tuple (myne there is a connection
> between the two errors?)
>
> Here is the log file:
>
> Pig Stack Trace
> ---------------
> ERROR 1066: Unable to open iterator for alias datesGroupsAmount
>
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias datesGroupsAmount
>                 at
> org.apache.pig.PigServer.openIterator(PigServer.java:836)
>                 at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
>                 at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
>                 at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
>                 at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
>                 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>                 at org.apache.pig.Main.run(Main.java:604)
>                 at org.apache.pig.Main.main(Main.java:157)
>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>                 at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>                 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>                 at java.lang.reflect.Method.invoke(Method.java:601)
>                 at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.io.IOException: Job terminated with anomalous status FAILED
>                 at
> org.apache.pig.PigServer.openIterator(PigServer.java:828)
>                 ... 12 more
>
>
> any help will be appreciate
> thanks
> Noam
>
>
> ________________________________
>
> This email contains proprietary and/or confidential information of Pontis.
> If you have received this email in error, please delete all copies without
> delay and do not copy, distribute, or rely on any information contained in
> this email.
>