Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - STRSPLIT bug?


+
Flo Leibert 2012-02-23, 02:13
+
Norbert Burger 2012-02-23, 03:59
+
Flo Leibert 2012-02-23, 16:32
+
Norbert Burger 2012-02-23, 17:27
+
Jonathan Coveney 2012-02-23, 17:36
Copy link to this message
-
Re: STRSPLIT bug?
Norbert Burger 2012-02-23, 18:10
Agreed -- the only reason I went with the built-in route is that I had this
suspicion Flo has a fixed UDF on the far side of that sample code for which
he's trying to match the interface.  The UDF implementation would be
straightforward.

Norbert

On Thu, Feb 23, 2012 at 12:36 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> I bet it would be more efficient to just make a udf that goes from tuple to
> bag. This is not an uncommon request, though, and probably something we
> should build into pig.
>
> 2012/2/23 Norbert Burger <[EMAIL PROTECTED]>
>
> > Try adding a FLATTEN before applying TOBAG:
> >
> > foo = foreach searches GENERATE FLATTEN(STRSPLIT(hostinglist, ',')) as
> > hostings, user;
> > bar = foreach foo GENERATE TOBAG(*);
> > dump bar;
> >
> > Norbert
> >
> > On Thu, Feb 23, 2012 at 11:32 AM, Flo Leibert <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > I was expecting similar behavior as TOKENIZE from STRSPLIT. I.e. all
> > items
> > > ending up in a bag.
> > > Is there a way to further split these out such that they're elements
> of a
> > > bag? The TOBAG function just places the entire tuple in a bag...
> > >
> > > Thanks!
> > >
> > > On Wed, Feb 22, 2012 at 7:59 PM, Norbert Burger <
> > [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Hi Flo - in your example data, it seems like the STRSPLIT() is
> working
> > as
> > > > expected -- the function returns back a tuple which is being
> serialized
> > > in
> > > > the shell as "(t1,t2,t3,t4)".
> > > >
> > > > When you mention "hostinglist isn't split properly", which part are
> you
> > > > referring to?
> > > >
> > > > Norbert
> > > >
> > > > On Wed, Feb 22, 2012 at 9:13 PM, Flo Leibert <
> > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Running pig 0.9.1 in local mode, STRSPLIT doesn't seem to split on
> > > ','. I
> > > > > have the following data
> > > > >
> > > > > user2 hosting9
> > > > > user1 hosting1,hosting2,hosting3,hosting4
> > > > > user1 hosting2,hosting4,hosting5
> > > > >
> > > > >
> > > > > searches = load '/data/sample/searches' using PigStorage('\t') as
> > > (user:
> > > > > chararray, hostinglist: chararray);
> > > > > grunt> describe searches
> > > > > searches: {user: chararray,hostinglist: chararray}
> > > > > foo = foreach searches GENERATE STRSPLIT(hostinglist, ',') as
> > hostings,
> > > > > user;
> > > > > dump foo
> > > > > ((hosting9),user2)
> > > > > ((hosting1,hosting2,hosting3,hosting4),user1)
> > > > > ((hosting2,hosting4,hosting5),user1)
> > > > >
> > > > >
> > > > > hostinglist isn't split properly - i tried to use the unicode
> > character
> > > > as
> > > > > well but still no luck. Is this a known bug?
> > > > >
> > > > > Thanks,
> > > > > Flo
> > > > >
> > > >
> > >
> >
>
+
Dmitriy Ryaboy 2012-02-29, 08:16