Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Can anyone give me a hint about this column behavior?


+
jeremiah rounds 2012-08-13, 21:49
+
jeremiah rounds 2012-08-13, 22:44
Copy link to this message
-
Re: Can anyone give me a hint about this column behavior?
Bill Graham 2012-08-13, 23:30
This seems like a bug in PigStorage. Would you mind opening a JIRA with the
steps to reproduce that you've include here?

thanks,
Bill

On Mon, Aug 13, 2012 at 3:44 PM, jeremiah rounds
<[EMAIL PROTECTED]>wrote:

> Greetings pig users,
>
> This is regarding my previous post (in quotes below)
>
>
> I was able to remove this column error by using the start up:
> pig -x local -M -t ColumnMapKeyPrune
>
>
> I have no more insight than that  I only tried it because someone else
> reported their column oriented error went away with that command line
> switch.  I restarted pig two times with and without the -t to verify
> the error went away and came back.
>
>
> With  pig -x local -M -t ColumnMapKeyPrune I get:
> grunt> dump s1;
> (11,21,31)
> (12,22,32)
> (13,23,33)
> (14,24,34)
> (15,25,35)
>
>
> With pig -x local -M I get:
> grunt > dump s1;
> (ERROR_9999_.csv,21,31)
> (ERROR_9999_.csv,22,32)
> (ERROR_9999_.csv,23,33)
> (ERROR_9999_.csv,24,34)
> (ERROR_9999_.csv,25,35)
>
>
>
>
> ---------- Forwarded message ----------
> From: jeremiah rounds <[EMAIL PROTECTED]>
> Date: Mon, Aug 13, 2012 at 5:49 PM
> Subject: Can anyone give me a hint about this column behavior?
> To: [EMAIL PROTECTED]
>
>
> Greetings,
>
> I am new to pig.  I am trying to get to know it on a laptop with
> hadoop 20.2 installed in local mode.  I have prior experience with
> hadoop, but I figure my error is so weird I blew the pig install or
> something.
>
> Here is what I have my problem distilled down too:
>
> $ pig -x local -M
>
>
> grunt> set pig.splitCombination false;
> grunt> cat ERROR_9999_.csv
> 11,21,31
> 12,22,32
> 13,23,33
> 14,24,34
> 15,25,35
>
>
>
> grunt> raw = load 'ERROR_9999_.csv' USING PigStorage(',',
> '-tagsource') AS (file: chararray, col1: chararray,col2: chararray,
> col3: chararray);
> grunt> dump raw;
> (ERROR_9999_.csv,11,21,31)
> (ERROR_9999_.csv,12,22,32)
> (ERROR_9999_.csv,13,23,33)
> (ERROR_9999_.csv,14,24,34)
> (ERROR_9999_.csv,15,25,35)
>
> grunt> s1 = FOREACH raw GENERATE  col1, col2, col3;
> grunt> dump s1;
> (ERROR_9999_.csv,21,31)
> (ERROR_9999_.csv,22,32)
> (ERROR_9999_.csv,23,33)
> (ERROR_9999_.csv,24,34)
> (ERROR_9999_.csv,25,35)
>
>
> Now obviously you wouldn't put on the filename only to take it off,
> but this is a distilled down repeatable case that captures my issue in
> a larger project.  col1 has become the filename even though it used to
> be a double digit number in a chararray for raw.
>
> The describes go like this:
> grunt> describe raw;
> raw: {file: chararray,col1: chararray,col2: chararray,col3: chararray}
> grunt> describe s1;
> s1: {col1: chararray,col2: chararray,col3: chararray}
>
> There is an explain at the end of the email if that is useful to
> anyone.  I have figured out that the issue seems related to -tagsource
> and pruning columns.  Is that indicative of anything I might have done
> wrong in an install?
>
>
> Thanks,
> Jeremiah
>
> grunt> explain s1
> 2012-08-13 17:47:28,315 [main] INFO
> org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns
> pruned for raw: $0
> initialized
> #-----------------------------------------------
> # New Logical Plan:
> #-----------------------------------------------
> s1: (Name: LOStore Schema:
>
> col1#41:chararray,col2#42:chararray,col3#43:chararray)ColumnPrune:InputUids=[42,
> 43, 41]ColumnPrune:OutputUids=[42, 43, 41]
> |
> |---s1: (Name: LOForEach Schema:
> col1#41:chararray,col2#42:chararray,col3#43:chararray)
>     |   |
>     |   (Name: LOGenerate[false,false,false] Schema:
> col1#41:chararray,col2#42:chararray,col3#43:chararray)
>     |   |   |
>     |   |   (Name: Cast Type: chararray Uid: 41)
>     |   |   |
>     |   |   |---col1:(Name: Project Type: bytearray Uid: 41 Input: 0
> Column: (*))
>     |   |   |
>     |   |   (Name: Cast Type: chararray Uid: 42)
>     |   |   |
>     |   |   |---col2:(Name: Project Type: bytearray Uid: 42 Input: 1
> Column: (*))
>     |   |   |
>     |   |   (Name: Cast Type: chararray Uid: 43)
>     |   |   |

*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*