Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> union


Hi,

According to Pig's documention on union, two schemas which have the same
schema (have the same length and  types can be implicitly cast) can be
concatenated (see http://pig.apache.org/docs/r0.11.1/basic.html#union)

However, when I try with:
A = load '1.txt'          using PigStorage(' ')  as (x:int, y:chararray,
z:chararray);
B = load '1_ext.txt'  using PigStorage(' ')  as (a:int, b:chararray,
c:chararray);
C = union A, B;
describe C;
DUMP C;
store C into '/home/kereno/Documents/pig-0.11.1/workspace/res';

with:
~/Documents/pig-0.11.1/workspace 130$ more 1.txt 1_ext.txt
::::::::::::::
1.txt
::::::::::::::
1 a aleph
2 b bet
3 g gimel
::::::::::::::
1_ext.txt
::::::::::::::
0 a alpha
0 b beta
0 g gimel
I get in result:~/Documents/pig-0.11.1/workspace 0$ more res/part-m-0000*
::::::::::::::
res/part-m-00000
::::::::::::::
0 a alpha
0 b beta
0 g gimel
 ::::::::::::::
res/part-m-00001
::::::::::::::
1 a aleph
2 b bet
3 g gimel

Whereas I was expecting something like
0 a alpha
0 b beta
0 g gimel
1 a aleph
2 b bet
3 g gimel

[all together]

I understand that two files for non-matching schemas would be generated but
why for union with a matching schema?

Thanks,
Keren

--
Keren Ouaknine
Web: www.kereno.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB