Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - schema of pig flatten


+
Huo Zhu 2012-09-04, 11:16
Copy link to this message
-
Re: schema of pig flatten
Russell Jurney 2012-09-04, 14:04
You must cast explicitly:

b = foreach a generate (int)foo as foo:int;

Russell Jurney
twitter.com/rjurney
[EMAIL PROTECTED]
datasyndrome.com

On Sep 4, 2012, at 4:17 AM, Huo Zhu <[EMAIL PROTECTED]> wrote:

> i recently meet this problem in my work, it's about pig flatten. i use a
> simple example to express it
>
> two files
> ===file1==> 1_a
> 2_b
> 4_d
>
> ===file2 (tab seperated)==> 1 a
> 2 b
> 3 c
>
> i tried three scripts in pig 0.9 and pig 0.10, and get some exceptions
>
> pig script 1:
>
> a = load 'file1' as (str:chararray);
> b = load 'file2' as (num:int, ch:chararray);
> a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int, ch:chararray);
> c = join a1 by num, b by num;
> dump c;   -- exception java.lang.String cannot be cast to java.lang.Integer
>
> pig script 2:
>
> a = load 'file1' as (str:chararray);
> b = load 'file2' as (num:int, ch:chararray);
> a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int, ch:chararray);
> a2 = foreach a1 generate (int)num as num, ch as ch;
> c = join a2 by num, b by num;
> dump c;   -- exception java.lang.String cannot be cast to java.lang.Integer
>
> pig script 3:
>
> a = load 'file1' as (str:chararray);
> b = load 'file2' as (num:int, ch:chararray);
> a1 = foreach a generate flatten(STRSPLIT(str,'_',2));
> a2 = foreach a1 generate (int)$0 as num, $1 as ch;
> c = join a2 by num, b by num;
> dump c;   -- right
>
> could somebody explain why script1 and script2 fail but script3 success?
> thanks !
+
Huo Zhu 2012-09-05, 02:08
+
Gianmarco De Francisci Mo... 2012-09-05, 08:22
+
Huo Zhu 2012-09-06, 08:14