-Re: PIG with -tagsource option behaves weird
Prabu Dhakshinamurthy 2013-02-04, 21:07
I found from some other message that,
starting pig with the flag '-t ColumnMapKeyPrune' helps fixing this issue i.e.,
start pig using the commandpig -x local -t ColumnMapKeyPrune sample.pig.
On Sun, Feb 3, 2013 at 12:17 PM, Prabu Dhakshinamurthy
<[EMAIL PROTECTED]> wrote:
> Dump of A:
> Dump of B:
> ILLUSTRATE B:
> | B | ip:chararray | domain_first_char:chararray |
> | | 184.108.40.206 | g |
> As seen in Dump B, instead of printing the ip value as the first field (as
> in illustrate B), it prints the ts field.
> On Sun, Feb 3, 2013 at 11:56 AM, Prabu Dhakshinamurthy
> <[EMAIL PROTECTED]> wrote:
>> I am using the -tagsource option while loading the input data in order to
>> identify the input source. It seems that, later while I project only
>> selected fields from the input tuple, there are some assumptions and certain
>> fields get projected all the time though I try to ignore them.
>> Take a look at my script.
>> rawdata = load 'data/201212*' using PigStorage(' ', '-tagsource') as
>> (filename:chararray, ts: int, ip: chararray, domain: chararray, answer:
>> A = foreach rawdata generate ts, ip, domain, answer,
>> CONCAT(CONCAT(filename, '_'), UPPER(SUBSTRING(domain, 0, 1))) as
>> domain_index, filename as filename;
>> B = foreach A generate ip as ip, SUBSTRING(domain, 0, 1) as
>> domain_first_char, filename;
>> dump A;
>> dump B;
>> ILLUSTRATE B;
>> While creating B, I am trying to include only selected fields from A.
>> However, if I dump B, the 'ts' field (the first field in A) keeps appearing
>> in B. But in ILLUSTRATE B, everything looks nice as expected.
>> I appreciate any help. Thanks!
>> Prabu D
> Prabu Dhakshinamurthy
> Graduate student | CSE | UCSD
Graduate student | CSE | UCSD