Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Can anyone give me a hint about this column behavior?


Copy link to this message
-
Re: Can anyone give me a hint about this column behavior?
This seems like a bug in PigStorage. Would you mind opening a JIRA with the
steps to reproduce that you've include here?

thanks,
Bill

On Mon, Aug 13, 2012 at 3:44 PM, jeremiah rounds
<[EMAIL PROTECTED]>wrote:

> Greetings pig users,
>
> This is regarding my previous post (in quotes below)
>
>
> I was able to remove this column error by using the start up:
> pig -x local -M -t ColumnMapKeyPrune
>
>
> I have no more insight than that  I only tried it because someone else
> reported their column oriented error went away with that command line
> switch.  I restarted pig two times with and without the -t to verify
> the error went away and came back.
>
>
> With  pig -x local -M -t ColumnMapKeyPrune I get:
> grunt> dump s1;
> (11,21,31)
> (12,22,32)
> (13,23,33)
> (14,24,34)
> (15,25,35)
>
>
> With pig -x local -M I get:
> grunt > dump s1;
> (ERROR_9999_.csv,21,31)
> (ERROR_9999_.csv,22,32)
> (ERROR_9999_.csv,23,33)
> (ERROR_9999_.csv,24,34)
> (ERROR_9999_.csv,25,35)
>
>
>
>
> ---------- Forwarded message ----------
> From: jeremiah rounds <[EMAIL PROTECTED]>
> Date: Mon, Aug 13, 2012 at 5:49 PM
> Subject: Can anyone give me a hint about this column behavior?
> To: [EMAIL PROTECTED]
>
>
> Greetings,
>
> I am new to pig.  I am trying to get to know it on a laptop with
> hadoop 20.2 installed in local mode.  I have prior experience with
> hadoop, but I figure my error is so weird I blew the pig install or
> something.
>
> Here is what I have my problem distilled down too:
>
> $ pig -x local -M
>
>
> grunt> set pig.splitCombination false;
> grunt> cat ERROR_9999_.csv
> 11,21,31
> 12,22,32
> 13,23,33
> 14,24,34
> 15,25,35
>
>
>
> grunt> raw = load 'ERROR_9999_.csv' USING PigStorage(',',
> '-tagsource') AS (file: chararray, col1: chararray,col2: chararray,
> col3: chararray);
> grunt> dump raw;
> (ERROR_9999_.csv,11,21,31)
> (ERROR_9999_.csv,12,22,32)
> (ERROR_9999_.csv,13,23,33)
> (ERROR_9999_.csv,14,24,34)
> (ERROR_9999_.csv,15,25,35)
>
> grunt> s1 = FOREACH raw GENERATE  col1, col2, col3;
> grunt> dump s1;
> (ERROR_9999_.csv,21,31)
> (ERROR_9999_.csv,22,32)
> (ERROR_9999_.csv,23,33)
> (ERROR_9999_.csv,24,34)
> (ERROR_9999_.csv,25,35)
>
>
> Now obviously you wouldn't put on the filename only to take it off,
> but this is a distilled down repeatable case that captures my issue in
> a larger project.  col1 has become the filename even though it used to
> be a double digit number in a chararray for raw.
>
> The describes go like this:
> grunt> describe raw;
> raw: {file: chararray,col1: chararray,col2: chararray,col3: chararray}
> grunt> describe s1;
> s1: {col1: chararray,col2: chararray,col3: chararray}
>
> There is an explain at the end of the email if that is useful to
> anyone.  I have figured out that the issue seems related to -tagsource
> and pruning columns.  Is that indicative of anything I might have done
> wrong in an install?
>
>
> Thanks,
> Jeremiah
>
> grunt> explain s1
> 2012-08-13 17:47:28,315 [main] INFO
> org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns
> pruned for raw: $0
> initialized
> #-----------------------------------------------
> # New Logical Plan:
> #-----------------------------------------------
> s1: (Name: LOStore Schema:
>
> col1#41:chararray,col2#42:chararray,col3#43:chararray)ColumnPrune:InputUids=[42,
> 43, 41]ColumnPrune:OutputUids=[42, 43, 41]
> |
> |---s1: (Name: LOForEach Schema:
> col1#41:chararray,col2#42:chararray,col3#43:chararray)
>     |   |
>     |   (Name: LOGenerate[false,false,false] Schema:
> col1#41:chararray,col2#42:chararray,col3#43:chararray)
>     |   |   |
>     |   |   (Name: Cast Type: chararray Uid: 41)
>     |   |   |
>     |   |   |---col1:(Name: Project Type: bytearray Uid: 41 Input: 0
> Column: (*))
>     |   |   |
>     |   |   (Name: Cast Type: chararray Uid: 42)
>     |   |   |
>     |   |   |---col2:(Name: Project Type: bytearray Uid: 42 Input: 1
> Column: (*))
>     |   |   |
>     |   |   (Name: Cast Type: chararray Uid: 43)
>     |   |   |

*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB