Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Optionally Enclosed By  in PIG


Copy link to this message
-
Re: Optionally Enclosed By in PIG
Thejas Nair 2011-10-18, 16:27
The default load function of pig (PigStorage) does not support escaping
of the delimiter. If you hvae any characters that will not appear in
your data, you can use that as the delim (control-chars for example, i
believe they don't appear in utf8 strings).
Otherwise, you can extend PigStorage class in pig to create a new load
func that supports escaping (and contribute it to piggybank if you like).

Thanks,
Thejas

On 10/18/11 12:15 AM, kiranprasad wrote:
> Can it be done using PIG Latin Script?
>
> Regards
> Kiran
>
> -----Original Message----- From: Gheorghe Muresan
> Sent: Tuesday, October 18, 2011 10:47 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Optionally Enclosed By in PIG
>
> If some columns may contain the separator, you can escape their
> content before writing them into the table, and unescape them after
> you split the row, before you use the content.
> You can use URL escape characters (e.g.
> http://www.werockyourweb.com/url-escape-characters) or something more
> reader-friendly (e.g. "|" -> "<pipe>").
>
> Cheers,
> Gheorghe
>
> On Mon, Oct 17, 2011 at 9:37 PM, kiranprasad
> <[EMAIL PROTECTED]> wrote:
>> Hi
>>
>> How can I ignore the seperator character in middle of a column value.
>>
>> eg : Seperator char is �|�.
>>
>> The Record values are | seperated
>>
>> xyz|1234|98798|�xyz|abc�|
>>
>>
>> Regards
>> Kiran.G
>
>