Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> strange problem with count and distinct subscribers


Copy link to this message
-
Re: strange problem with count and distinct subscribers
Which version are you using? I am wondering whether PIG-3466 fixes your
error-
https://issues.apache.org/jira/browse/PIG-3466

You can reproduce the error only when loading more data. You also see a
random type cast error. My guess is that you ran into the race condition
that PIG-3466 fixed, and your bag is corrupted resulting in the type cast
error.

On Mon, Nov 18, 2013 at 6:30 AM, Noam Lavie <[EMAIL PROTECTED]> wrote:

> Hi,
> I'm trying to run the following pig script (it main purpose is to read
> inputs that contains info about phone calls, the script suppose to count
> the different types of calls and the different subscribers that made them):
>
> SET default_parallel 40;
> allFiles = LOAD
> 'maprfs:///analytics/data/consumers/mapred/facts/done/FACT_VOICE_GE_Analytics9_1/20131114/'
> USING PigStorage(',');
> allFilesFiltered = FILTER allFiles BY $11 MATCHES '.*On.*' AND $4 > 0;
> datesList = FOREACH allFilesFiltered GENERATE SUBSTRING($0, 0, 10) AS day,
> $11 AS callType, $4 AS amount, $1 AS subscriberKey;
> datesGroups = GROUP datesList BY (day, callType);
> datesGroupsAmount = foreach datesGroups {
>     unique_seubscriber = DISTINCT datesList.subscriberKey;
>     GENERATE group.day, group.callType, COUNT(datesList),
> SUM(datesList.amount), COUNT(unique_seubscriber);
> };
> dump datesGroupsAmount;
>
> the problem is with the  unique_seubscriber. The count and distinct
> doesn't work. The strange thing is that if I run script separately for each
> sub folder's input  - the run will succeed for each part, but if I'm giving
> the hall  inputs folders together it fails and I get the following error:
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open
> iterator for alias datesGroupsAmount
>
> Another error that I get from time to time (if I'm making small changes in
> the script) is:
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open
> iterator for alias datesGroupsAmount. Backend error : java.lang.Boolean
> cannot be cast to org.apache.pig.data.Tuple (myne there is a connection
> between the two errors?)
>
> Here is the log file:
>
> Pig Stack Trace
> ---------------
> ERROR 1066: Unable to open iterator for alias datesGroupsAmount
>
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias datesGroupsAmount
>                 at
> org.apache.pig.PigServer.openIterator(PigServer.java:836)
>                 at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
>                 at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
>                 at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
>                 at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
>                 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>                 at org.apache.pig.Main.run(Main.java:604)
>                 at org.apache.pig.Main.main(Main.java:157)
>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>                 at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>                 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>                 at java.lang.reflect.Method.invoke(Method.java:601)
>                 at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.io.IOException: Job terminated with anomalous status FAILED
>                 at
> org.apache.pig.PigServer.openIterator(PigServer.java:828)
>                 ... 12 more
>
>
> any help will be appreciate
> thanks
> Noam
>
>
> ________________________________
>
> This email contains proprietary and/or confidential information of Pontis.
> If you have received this email in error, please delete all copies without
> delay and do not copy, distribute, or rely on any information contained in
> this email.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB