Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> More on issue with local vs mapreduce mode


Copy link to this message
-
RE: More on issue with local vs mapreduce mode
Dear Serega,
I am now using log4j for debugging my UDF. Here is what I found out. For some reason in mapreduce mode my exec function does not get called.   The log message in the constructor gets printed onto the console. However, the log message in the exec funciton does not get printed to the console. In local mode all the log messages can be seen on the console.

 public Tuple exec(Tuple input) throws IOException {
        log.info("Into the exec function");
        // rest of the code
}
public CustomFilter ()
            {
              
                log.info("Hello World!");
        }
> From: [EMAIL PROTECTED]
> Date: Wed, 6 Nov 2013 21:19:10 +0400
> Subject: Re: More on issue with local vs mapreduce mode
> To: [EMAIL PROTECTED]
>
> You get 4 empty tuples.
> Maybe your UDF parser.customFilter(key,'AAAAA') works differently? Maybe
> you use the old version?
> You can add print statement to UDF and see what does it accept and what
> does produce.
>
>
> 2013/11/6 Sameer Tilak <[EMAIL PROTECTED]>
>
> > Dear Serega,
> >
> > When I run the script in local mode, I get correct o/p stored in
> > AU/part-m-000 file. However, when I run it in the mapreduce mode (with i/p
> > and o/p from HDFS), the file /scratch/AU/part-m-000 is of size 4 and there
> > is nothing in it.
> >
> > I am not sure whether AU relation somehow does not get realized correctly
> > or the problem happens during the storing stage.
> >
> > Here are some of  the console messages:
> >
> > HadoopVersion    PigVersion    UserId    StartedAt    FinishedAt
> >  Features
> > 1.0.3    0.11.1    p529444    2013-11-06 07:14:15    2013-11-06 07:14:40
> >  UNKNOWN
> >
> > Success!
> >
> > Job Stats (time in seconds):
> > JobId    Maps    Reduces    MaxMapTime    MinMapTIme    AvgMapTime
> >  MedianMapTime    MaxReduceTime    MinReduceTime    AvgReduceTime
> >  MedianReducetime    Alias    Feature    Outputs
> > job_201311011343_0042    1    0    6    6    6    6    0    0    0    0
> >  A,AU    MULTI_QUERY,MAP_ONLY    /scratch/A,/scratch/AU,
> >
> > Input(s):
> > Successfully read 4 records (311082 bytes) from: "/scratch/file.seq"
> >
> > Output(s):
> > Successfully stored 4 records (231 bytes) in: "/scratch/A"
> > Successfully stored 4 records (3 bytes) in: "/scratch/AU"
> >
> > Counters:
> > Total records written : 8
> > Total bytes written : 234
> > Spillable Memory Manager spill count : 0
> > Total bags proactively spilled: 0
> > Total records proactively spilled: 0
> >
> > Job DAG:
> > job_201311011343_0042
> >
> >
> > 2013-11-06 07:14:40,352 [main] INFO
> >  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - Success!
> >
> > Here is the o/p of following commands:
> >
> > hadoop --config $HADOOP_CONF_DIR fs -ls /scratch/AU/part-m-00000
> >
> > -rw-r--r--   1 username groupname          4 2013-11-06 07:14
> > /scratch/AU/part-m-00000
> >
> > I am not sure whether DUMP will give me correct result, but when I
> > replaced store by dump AU in the mapredue mode then I get
> > AU as:
> > ()
> > ()
> > ()
> > ()
> >
> > > From: [EMAIL PROTECTED]
> > > Date: Wed, 6 Nov 2013 11:19:03 +0400
> > > Subject: Re: More on issue with local vs mapreduce mode
> > > To: [EMAIL PROTECTED]
> > >
> > > "The same script does not work in the mapreduce mode. "
> > > What does it mean?
> > >
> > >
> > > 2013/11/6 Sameer Tilak <[EMAIL PROTECTED]>
> > >
> > > > Hello,
> > > >
> > > > My script in the local mode works perfectly. The same script does not
> > work
> > > > in the mapreduce mode. For the local mode, the o/p is saved in the
> > current
> > > > directory, where as for the mapreduce mode I use /scrach directory on
> > HDFS.
> > > >
> > > > Local mode:
> > > >
> > > > A = LOAD 'file.seq' USING SequenceFileLoader AS (key: chararray, value:
> > > > chararray);
> > > > DESCRIBE A;
> > > > STORE A into 'A';
> > > >
> > > > AU = FOREACH A GENERATE FLATTEN(parser.customFilter(key,'AAAAA'));
> > > > STORE AU into 'AU';
> > > >
> > >      
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB