Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Weird bug of REPLACE


+
MiaoMiao 2012-08-14, 05:03
Copy link to this message
-
Re: Weird bug of REPLACE
Hi,

If you look at the source code of REPLACE, what it does is basically:

String source = "[02/Aug/2012:05:01:17";
> String target ="[";
> String replaceWith = "";
> return source.replaceAll(source, target, replaceWith);
Note that Java String.replaceAll() takes a regular expression for the 2nd
parameter (i.e. target), and "[" is a special character. To use it as is,
you have to escape it, so in your Pig script, you should do:

REPLACE(date,'\\[','')

Now regarding the result that you're seeing, it looks like whatever
exception is thrown inside REPLACE is swallowed rather than makes the job
fail, and null is returned:

        try{
>             ...
>         }catch(Exception e){
>             warn("Failed to process input; error - " + e.getMessage(),
> PigWarning.*UDF_WARNING_1*);
>             return null;
>         }
But I do see the following message at the end of the job status:

2012-08-17 16:51:25,061 [main] WARN
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Encountered *Warning UDF_WARNING_1* 1 time(s)

I must admit that this is not very visible though.

Thanks,
Cheolsoo

On Mon, Aug 13, 2012 at 10:03 PM, MiaoMiao <[EMAIL PROTECTED]> wrote:

> I used pig to do some ETL job, but met with a strange bug of the
> built-in REPLACE function.
>
> After I replace '[' with '' in '[02/Aug/2012:05:01:17' , the whole
> string just went blank.
>
> Here I posted some info that may help debug.
>
> My pig version is: Apache Pig version 0.11.0-SNAPSHOT (r1364475)
> compiled Jul 23 2012, 10:30:53
>
> The original text file:
> ip.ip.ip.ip - - [02/Aug/2012:05:01:17 -0600] "GET
> /player.php/sid/XNDM0Njk3MjEy/v.swf HTTP/1.1" 302 26
>
> The whole pig script is :
> read = load '/home/test/apacheLog'
> using PigStorage(' ')
> as (
>           ip:chararray
>         , indentity:chararray
>         , name:chararray
>         , date:chararray
>         , timezone:chararray
>         , method:chararray
>         , path:chararray
>         , protocol:chararray
>         , status:chararray
>         , size:chararray
> );
> dump read;
>
> --(ip.ip.ip.ip,-,-,[02/Aug/2012:05:01:17,-0600],"GET,/player.php/sid/XNDM0Njk3MjEy/v.swf,HTTP/1.1",302,26)
> data = foreach read generate
>           ip
>         , REPLACE(date,'[','')
>         , REPLACE(timezone,']','')
>         , REPLACE(method,'"','')
>         , path
>         , REPLACE(protocol,'"','')
>         , status
>         , size;
> describe data;
> --data: {ip: chararray,date: chararray,timezone: chararray,method:
> chararray,path: chararray,protocol: chararray,status: chararray,size:
> chararray}
> dump data;
>
> --(ip.ip.ip.ip,,-0600,GET,/player.php/sid/XNDM0Njk3MjEy/v.swf,HTTP/1.1,302,26)
>
+
MiaoMiao 2012-09-04, 07:37
+
Bill Graham 2012-09-04, 18:03
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB