Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> STRSPLIT bug?


Copy link to this message
-
Re: STRSPLIT bug?
Try adding a FLATTEN before applying TOBAG:

foo = foreach searches GENERATE FLATTEN(STRSPLIT(hostinglist, ',')) as
hostings, user;
bar = foreach foo GENERATE TOBAG(*);
dump bar;

Norbert

On Thu, Feb 23, 2012 at 11:32 AM, Flo Leibert <[EMAIL PROTECTED]>wrote:

> I was expecting similar behavior as TOKENIZE from STRSPLIT. I.e. all items
> ending up in a bag.
> Is there a way to further split these out such that they're elements of a
> bag? The TOBAG function just places the entire tuple in a bag...
>
> Thanks!
>
> On Wed, Feb 22, 2012 at 7:59 PM, Norbert Burger <[EMAIL PROTECTED]
> >wrote:
>
> > Hi Flo - in your example data, it seems like the STRSPLIT() is working as
> > expected -- the function returns back a tuple which is being serialized
> in
> > the shell as "(t1,t2,t3,t4)".
> >
> > When you mention "hostinglist isn't split properly", which part are you
> > referring to?
> >
> > Norbert
> >
> > On Wed, Feb 22, 2012 at 9:13 PM, Flo Leibert <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Running pig 0.9.1 in local mode, STRSPLIT doesn't seem to split on
> ','. I
> > > have the following data
> > >
> > > user2 hosting9
> > > user1 hosting1,hosting2,hosting3,hosting4
> > > user1 hosting2,hosting4,hosting5
> > >
> > >
> > > searches = load '/data/sample/searches' using PigStorage('\t') as
> (user:
> > > chararray, hostinglist: chararray);
> > > grunt> describe searches
> > > searches: {user: chararray,hostinglist: chararray}
> > > foo = foreach searches GENERATE STRSPLIT(hostinglist, ',') as hostings,
> > > user;
> > > dump foo
> > > ((hosting9),user2)
> > > ((hosting1,hosting2,hosting3,hosting4),user1)
> > > ((hosting2,hosting4,hosting5),user1)
> > >
> > >
> > > hostinglist isn't split properly - i tried to use the unicode character
> > as
> > > well but still no luck. Is this a known bug?
> > >
> > > Thanks,
> > > Flo
> > >
> >
>