Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Weird bug of REPLACE


+
MiaoMiao 2012-08-14, 05:03
+
Cheolsoo Park 2012-08-18, 00:05
+
MiaoMiao 2012-09-04, 07:37
Copy link to this message
-
Re: Re: Weird bug of REPLACE
Opened a JIRA to better clarify the docs here:
https://issues.apache.org/jira/browse/PIG-2905

On Tue, Sep 4, 2012 at 12:37 AM, MiaoMiao <[EMAIL PROTECTED]> wrote:

> Pity the document of REPLACE doesn't mention about regex at all. Thank
> you so much for your reply, being able to know what's going on is such
> a relief. Now I can trust myself with pig a little more.
>
>
> On Sat, 18 Aug 2012 at 00:05:29 AM, Cheolsoo Park <[EMAIL PROTECTED]>
> wrote:
> > Hi,
>
> > If you look at the source code of REPLACE, what it does is basically:
>
> > String source = "[02/Aug/2012:05:01:17";
> > > String target ="[";
> > > String replaceWith = "";
> > > return source.replaceAll(source, target, replaceWith);
>
>
> > Note that Java String.replaceAll() takes a regular expression for the 2nd
> > parameter (i.e. target), and "[" is a special character. To use it as is,
> > you have to escape it, so in your Pig script, you should do:
>
> > REPLACE(date,'\\[','')
>
> > Now regarding the result that you're seeing, it looks like whatever
> > exception is thrown inside REPLACE is swallowed rather than makes the job
> > fail, and null is returned:
>
> >         try{
> > >             ...
> > >         }catch(Exception e){
> > >             warn("Failed to process input; error - " + e.getMessage(),
> > > PigWarning.*UDF_WARNING_1*);
> > >             return null;
> > >         }
>
>
> > But I do see the following message at the end of the job status:
>
> > 2012-08-17 16:51:25,061 [main] WARN
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - Encountered *Warning UDF_WARNING_1* 1 time(s)
>
> > I must admit that this is not very visible though.
>
> > Thanks,
> > Cheolsoo
>
> > On Mon, Aug 13, 2012 at 10:03 PM, MiaoMiao <[EMAIL PROTECTED]> wrote:
>
> > > I used pig to do some ETL job, but met with a strange bug of the
> > > built-in REPLACE function.
> > >
> > > After I replace '[' with '' in '[02/Aug/2012:05:01:17' , the whole
> > > string just went blank.
> > >
> > > Here I posted some info that may help debug.
> > >
> > > My pig version is: Apache Pig version 0.11.0-SNAPSHOT (r1364475)
> > > compiled Jul 23 2012, 10:30:53
> > >
> > > The original text file:
> > > ip.ip.ip.ip - - [02/Aug/2012:05:01:17 -0600] "GET
> > > /player.php/sid/XNDM0Njk3MjEy/v.swf HTTP/1.1" 302 26
> > >
> > > The whole pig script is :
> > > read = load '/home/test/apacheLog'
> > > using PigStorage(' ')
> > > as (
> > >           ip:chararray
> > >         , indentity:chararray
> > >         , name:chararray
> > >         , date:chararray
> > >         , timezone:chararray
> > >         , method:chararray
> > >         , path:chararray
> > >         , protocol:chararray
> > >         , status:chararray
> > >         , size:chararray
> > > );
> > > dump read;
> > >
> > >
> --(ip.ip.ip.ip,-,-,[02/Aug/2012:05:01:17,-0600],"GET,/player.php/sid/XNDM0Njk3MjEy/v.swf,HTTP/1.1",302,26)
> > > data = foreach read generate
> > >           ip
> > >         , REPLACE(date,'[','')
> > >         , REPLACE(timezone,']','')
> > >         , REPLACE(method,'"','')
> > >         , path
> > >         , REPLACE(protocol,'"','')
> > >         , status
> > >         , size;
> > > describe data;
> > > --data: {ip: chararray,date: chararray,timezone: chararray,method:
> > > chararray,path: chararray,protocol: chararray,status: chararray,size:
> > > chararray}
> > > dump data;
> > >
> > >
> --(ip.ip.ip.ip,,-0600,GET,/player.php/sid/XNDM0Njk3MjEy/v.swf,HTTP/1.1,302,26)
> > >
>

--
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB