Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - oddness with large numeric map[] values


Copy link to this message
-
Re: oddness with large numeric map[] values
Guy Bayes 2009-07-19, 16:28
awesome Santhosh and thanks for the responsiveness!

2009/7/19 Santhosh Srinivasan <[EMAIL PROTECTED]>

> With Pig-880 (https://issues.apache.org/jira/browse/PIG-880), the value
> per key in the text data will be treated as bytearray. When Pig-880 is
> committed (implicit/explicit) casts will be required to interpret the data.
>
> Thanks,
> Santhosh
>
> -----Original Message-----
> From: zjffdu [mailto:[EMAIL PROTECTED]]
> Sent: Sunday, July 19, 2009 4:40 PM
> To: [EMAIL PROTECTED]
> Subject: RE: oddness with large numeric map[] values
>
> +1, I am also curious to know about this.  I think it make no sense if
> there's no such feature.
>
>
>
> -----Original Message-----
> From: Guy Bayes [mailto:[EMAIL PROTECTED]]
> Sent: 2009年7月18日 23:50
> To: [EMAIL PROTECTED]
> Subject: Re: oddness with large numeric map[] values
>
> yup the problem is that the actual data actually contains strings that look
> like numbers occasionally.
>
> is there any way to tell apache to treat it like a string always?
>
> On Sat, Jul 18, 2009 at 11:23 PM, Santhosh Srinivasan
> <[EMAIL PROTECTED]>wrote:
>
> > I just read your email all over again. The reason for the failure is the
> > following.
> >
> > 1. [apache#2000000000000000000000zzz,foo#foo1] - Here the value for the
> > key apache is treated as a string.
> >
> > 2. [apache#2000000000000000000000,foo#foo1] - Here the value for the key
> > apache is treated as an integer. Since 2000000000000000000000 is too big
> > to fit into an integer it failed and inserted a null. Try adding an L at
> > the end of 2000000000000000000000, i.e., 2000000000000000000000L
> >
> > Santhosh
> >
> > -----Original Message-----
> > From: Guy Bayes [mailto:[EMAIL PROTECTED]]
> > Sent: Saturday, July 18, 2009 11:21 PM
> > To: [EMAIL PROTECTED]
> > Subject: Re: oddness with large numeric map[] values
> >
> > this is reproducible (at least for me) with any large number in the
> > value
> > column of any map schema declaration. Tried it on pig v0.2 and 0.3
> >
> > On Sat, Jul 18, 2009 at 11:09 PM, Santhosh Srinivasan
> > <[EMAIL PROTECTED]>wrote:
> >
> > > In the second case, there is a warning message:
> > >
> > > - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 1
> > time(s).
> > >
> > > The conversion of the text data into a map failed for some reason. If
> > > you examine the system logs on the job tracker UI, you should probably
> > > see why the conversion failed. You can also achieve the same result by
> > > turning warning aggregation off with the -w option.
> > >
> > > Santhosh
> > >
> > >
> > > -----Original Message-----
> > > From: Guy Bayes [mailto:[EMAIL PROTECTED]]
> > > Sent: Saturday, July 18, 2009 9:37 PM
> > > To: [EMAIL PROTECTED]
> > > Subject: oddness with large numeric map[] values
> > >
> > > Hello all, new to this list, new to pig, running into some odd
> > behavior
> > > with
> > > map[] data types.
> > >
> > > Please forgive me if these are known issues or problems with my
> > syntax,
> > >
> > > What am i doing wrong here? missing some cast somewhere?
> > >
> > > This works:
> > >
> > > grunt> cat data2
> > > [apache#2000000000000000000000zzz,foo#foo1]
> > > grunt> T1 = load 'data2' as (f1:map[]);
> > > grunt> dump T1;
> > > 2009-07-18 21:35:11,871 [Thread-11] WARN
> > > org.apache.hadoop.mapred.JobClient
> > > - Use GenericOptionsParser for parsing the arguments. Applications
> > > should
> > > implement Tool for the same.
> > > 2009-07-18 21:35:21,889 [main] INFO
> > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
> > > uncher
> > > - 100% complete
> > > 2009-07-18 21:35:21,889 [main] INFO
> > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
> > > uncher
> > > - Success!
> > > ([foo#foo1,apache#2000000000000000000000zzz])
> > >
> > > This doesn't return anything.
> > >
> > > grunt> cat data
> > > [apache#2000000000000000000000,foo#foo1]
> > > grunt> T1 = load 'data' as (f1:map[]);
you may be acquainted with the night
but i have seen the darkness in the day
and you must know it is a terrifying sight...