Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Removing characters from a bag


+
Mohit Anchlia 2013-06-24, 18:18
+
Ruslan Al-Fakikh 2013-06-25, 10:13
+
Mohit Anchlia 2013-06-26, 04:27
Copy link to this message
-
Re: Removing characters from a bag
I guess that if you use newlines as row separator than Pig will load them
using ALL the newlines. I don't think it can distinguish them. So you end
up having too many rows. I think this type of input should be considered to
be corrupted. If you need the newlines in the rows themselves I suggest you
can use another separator for the rows, not the newlines.
Thanks
On Wed, Jun 26, 2013 at 8:27 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> We use newline as row seprater, however we are getting some newlines in a
> column. So data looks like this
>
> Hello I \n am \n here
> Hello\n I am here
>
> Those are 2 lines however it gets broken down as 5 lines because of \n in
> between and the real line ends. I tried to use foreach generate
> REPLACE('\n',''); . Is that the right thing to do? Does it replace all \n
> or only the first one?
>
> On Tue, Jun 25, 2013 at 3:13 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]
> >wrote:
>
> > Hi Mohit,
> >
> > I don't clearly understand your use case. It depends on how you read the
> > input, how you use the newlines... As the row separator, or just inside a
> > row as a normal character.
> > Can you put a simple example of input and output that you need?
> >
> > Thanks
> >
> >
> > On Mon, Jun 24, 2013 at 10:18 PM, Mohit Anchlia <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Is there a way to remove line feeds from a bag in foreach?
> > >
> > > We today just do:
> > >
> > >
> > > page = foreach B generate p;
> > >
> > >
> > >
> > > Is there a way to remove line from above foreach? I see you can do
> > > DISTINCT, SUM but can I also replace newline with a space?
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB