Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Removing characters from a bag


Copy link to this message
-
Re: Removing characters from a bag
We use newline as row seprater, however we are getting some newlines in a
column. So data looks like this

Hello I \n am \n here
Hello\n I am here

Those are 2 lines however it gets broken down as 5 lines because of \n in
between and the real line ends. I tried to use foreach generate
REPLACE('\n',''); . Is that the right thing to do? Does it replace all \n
or only the first one?

On Tue, Jun 25, 2013 at 3:13 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>wrote:

> Hi Mohit,
>
> I don't clearly understand your use case. It depends on how you read the
> input, how you use the newlines... As the row separator, or just inside a
> row as a normal character.
> Can you put a simple example of input and output that you need?
>
> Thanks
>
>
> On Mon, Jun 24, 2013 at 10:18 PM, Mohit Anchlia <[EMAIL PROTECTED]
> >wrote:
>
> > Is there a way to remove line feeds from a bag in foreach?
> >
> > We today just do:
> >
> >
> > page = foreach B generate p;
> >
> >
> >
> > Is there a way to remove line from above foreach? I see you can do
> > DISTINCT, SUM but can I also replace newline with a space?
> >
>