Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Removing characters from a bag


Copy link to this message
-
Re: Removing characters from a bag
Ruslan Al-Fakikh 2013-06-30, 01:01
I guess that if you use newlines as row separator than Pig will load them
using ALL the newlines. I don't think it can distinguish them. So you end
up having too many rows. I think this type of input should be considered to
be corrupted. If you need the newlines in the rows themselves I suggest you
can use another separator for the rows, not the newlines.
Thanks
On Wed, Jun 26, 2013 at 8:27 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> We use newline as row seprater, however we are getting some newlines in a
> column. So data looks like this
>
> Hello I \n am \n here
> Hello\n I am here
>
> Those are 2 lines however it gets broken down as 5 lines because of \n in
> between and the real line ends. I tried to use foreach generate
> REPLACE('\n',''); . Is that the right thing to do? Does it replace all \n
> or only the first one?
>
> On Tue, Jun 25, 2013 at 3:13 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]
> >wrote:
>
> > Hi Mohit,
> >
> > I don't clearly understand your use case. It depends on how you read the
> > input, how you use the newlines... As the row separator, or just inside a
> > row as a normal character.
> > Can you put a simple example of input and output that you need?
> >
> > Thanks
> >
> >
> > On Mon, Jun 24, 2013 at 10:18 PM, Mohit Anchlia <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Is there a way to remove line feeds from a bag in foreach?
> > >
> > > We today just do:
> > >
> > >
> > > page = foreach B generate p;
> > >
> > >
> > >
> > > Is there a way to remove line from above foreach? I see you can do
> > > DISTINCT, SUM but can I also replace newline with a space?
> > >
> >
>