Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> STRSPLIT bug?


+
Flo Leibert 2012-02-23, 02:13
+
Norbert Burger 2012-02-23, 03:59
+
Flo Leibert 2012-02-23, 16:32
Copy link to this message
-
Re: STRSPLIT bug?
Try adding a FLATTEN before applying TOBAG:

foo = foreach searches GENERATE FLATTEN(STRSPLIT(hostinglist, ',')) as
hostings, user;
bar = foreach foo GENERATE TOBAG(*);
dump bar;

Norbert

On Thu, Feb 23, 2012 at 11:32 AM, Flo Leibert <[EMAIL PROTECTED]>wrote:

> I was expecting similar behavior as TOKENIZE from STRSPLIT. I.e. all items
> ending up in a bag.
> Is there a way to further split these out such that they're elements of a
> bag? The TOBAG function just places the entire tuple in a bag...
>
> Thanks!
>
> On Wed, Feb 22, 2012 at 7:59 PM, Norbert Burger <[EMAIL PROTECTED]
> >wrote:
>
> > Hi Flo - in your example data, it seems like the STRSPLIT() is working as
> > expected -- the function returns back a tuple which is being serialized
> in
> > the shell as "(t1,t2,t3,t4)".
> >
> > When you mention "hostinglist isn't split properly", which part are you
> > referring to?
> >
> > Norbert
> >
> > On Wed, Feb 22, 2012 at 9:13 PM, Flo Leibert <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Running pig 0.9.1 in local mode, STRSPLIT doesn't seem to split on
> ','. I
> > > have the following data
> > >
> > > user2 hosting9
> > > user1 hosting1,hosting2,hosting3,hosting4
> > > user1 hosting2,hosting4,hosting5
> > >
> > >
> > > searches = load '/data/sample/searches' using PigStorage('\t') as
> (user:
> > > chararray, hostinglist: chararray);
> > > grunt> describe searches
> > > searches: {user: chararray,hostinglist: chararray}
> > > foo = foreach searches GENERATE STRSPLIT(hostinglist, ',') as hostings,
> > > user;
> > > dump foo
> > > ((hosting9),user2)
> > > ((hosting1,hosting2,hosting3,hosting4),user1)
> > > ((hosting2,hosting4,hosting5),user1)
> > >
> > >
> > > hostinglist isn't split properly - i tried to use the unicode character
> > as
> > > well but still no luck. Is this a known bug?
> > >
> > > Thanks,
> > > Flo
> > >
> >
>
+
Jonathan Coveney 2012-02-23, 17:36
+
Norbert Burger 2012-02-23, 18:10
+
Dmitriy Ryaboy 2012-02-29, 08:16
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB