Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> [HELP] How to split a string in pig script ?


+
Leon Town 2012-08-09, 05:37
+
Babu, Prashanth 2012-08-09, 06:10
+
Bill Graham 2012-08-09, 06:53
+
Leon Town 2012-08-10, 02:13
+
Leon Town 2012-08-22, 01:49
+
Bill Graham 2012-08-22, 04:32
Copy link to this message
-
Re: [HELP] How to split a string in pig script ?
You an alsouse STRSPLIT or REGEX_EXTRACT to achive the split and the indexed string
-----Bill Graham <[EMAIL PROTECTED]> wrote: -----
To: Leon Town <[EMAIL PROTECTED]>
From: Bill Graham <[EMAIL PROTECTED]>
Date: 08/22/2012 10:03AM
Cc: [EMAIL PROTECTED]
Subject: Re: [HELP] How to split a string in pig script ?

You can pass an optional second argument to TOKENIZE which is the delimiter.

On Tue, Aug 21, 2012 at 6:49 PM, Leon Town <[EMAIL PROTECTED]> wrote:

> Another question:
> How can I specify the delimiters to split a string?
>
> Thanks!
>
>
> 2012/8/10 Leon Town <[EMAIL PROTECTED]>
>
>> Oh Bill, you're really genius.
>> Thanks!
>>
>>
>> 2012/8/9 Bill Graham <[EMAIL PROTECTED]>
>>
>>> Something like this should do it:
>>>
>>> A = FOREACH data GENERATE name, FLATTEN(TOKENIZE(ids));
>>>
>>>
>>> On Wed, Aug 8, 2012 at 11:10 PM, Babu, Prashanth <
>>> [EMAIL PROTECTED]
>>> > wrote:
>>>
>>> > Can you also post the sample input and the desired output you are
>>> looking
>>> > for?
>>> >
>>> > Thanks,
>>> > Prashanth.
>>> >
>>> > -----Original Message-----
>>> > From: Leon Town [mailto:[EMAIL PROTECTED]]
>>> > Sent: Thursday, August 09, 2012 11:08 AM
>>> > To: [EMAIL PROTECTED]
>>> > Subject: [HELP] How to split a string in pig script ?
>>> >
>>> > The input schema is:
>>> > *{name:chararray, ids:chararray}*,
>>> >
>>> > and the format of *ids* is like:
>>> > id1,id2,id3,...,idn
>>> >
>>> > Now, I want to split *ids* and change the input into the below format:
>>> > name   id1
>>> > name   id2
>>> > ...
>>> > name   idn
>>> >
>>> >
>>> > How should I do this by pig script, instead of UDFs.
>>> >
>>> > Thanks!
>>> >
>>> > ______________________________________________________________________
>>> > Disclaimer:This email and any attachments are sent in strictest
>>> confidence
>>> > for the sole use of the addressee and may contain legally privileged,
>>> > confidential, and proprietary data.  If you are not the intended
>>> recipient,
>>> > please advise the sender by replying promptly to this email and then
>>> delete
>>> > and destroy this email and any attachments without any further use,
>>> copying
>>> > or forwarding
>>> >
>>>
>>>
>>>
>>> --
>>> *Note that I'm no longer using my Yahoo! email address. Please email me
>>> at
>>> [EMAIL PROTECTED] going forward.*
>>>
>>
>>
>
--
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*
=====-----=====-----====Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB