Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> STRSPLIT problems (or UDF shortcoming?)


Copy link to this message
-
Re: STRSPLIT problems (or UDF shortcoming?)
>From what I can tell, this does seem like a bug.  Switching to positional
specifiers seems to work around the issue:

TEST = FOREACH MOVEMENT GENERATE $3;
POSA = FOREACH TEST GENERATE STRSPLIT($0, '/');

Possibly some casting is being applied in one case (positional specifiers)
but not the other?

Norbert

On Thu, May 17, 2012 at 7:39 PM, Nerius Landys <[EMAIL PROTECTED]> wrote:

> > We ended up using 0.10 on EMR and its been working fine so far...
>
> OK a bit of bad news.  0.10 did not fix my problem.
> I'll recap the entire situation.
> HADOOP_HOME is set to hadoop-0.20.205.0, Pig version is now pig-0.10.0.
>
> File 'bin-proto-4' is:
>
> Meta    1234567890      foo     34
> Movement        1234567890      Rambetter       1/1     2/3
> Movement        1234567890      Freddyman       10/1    10/2
>
> (with tab delimiters)
>
> grunt> A = LOAD 'bin-proto-4';
> grunt> MOVEMENT = FILTER A BY (chararray) $0 == 'Movement';
> grunt> TEST = FOREACH MOVEMENT GENERATE $3 AS startpos:chararray;
> grunt> POSA = FOREACH TEST GENERATE STRSPLIT(startpos,'/');
> grunt> DUMP POSA;
> ()
> ()
>
> grunt> DUMP TEST;
> (1/1)
> (10/1)
>
> Ran this on my local machine just now.
>