Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Working with date converter


Copy link to this message
-
Re: Working with date converter
sorry for the delay !!!

it must be better option, but I wrote a simple loader, extending PigStorage
(I re-used/took a lot of code from PigStorage, specially its parse/split
method)
you need to complete the method 'process' to take the field/fields you need
to convert your date and then set the right field ( 0? )

to compile it, you have to put in your classpath pig-core.jar and
hadoop-code.jar
something like:

javac -cp /usr/lib/pig/pig-core.jar:/usr/lib/hadoop/hadoop-core.jar
myPackage/MyLoader.java

any doubt, just let me know

On Tue, Nov 8, 2011 at 7:31 AM, pablomar <[EMAIL PROTECTED]>wrote:

> sorry, I read custom log and I thought you have a custom loader
> you can extend PigStorage and do the field replacement in its putNext
> method
>
> I'll do an example later
>
> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote:
> > Yes, you understand my task right. What is putNext? I'm new to pig, and
> > didn't customize udfs.
> >
> > 2011/11/8 pablomar <[EMAIL PROTECTED]>
> >
> >> sorry, I didn't understand completely
> >>
> >> do you want to read a line, if the date is invalid (performing a
> >> IsoToUnix directly and not a regex before) you want to skip it ? it
> >> that ?
> >> if yes, you can replace the field with your converted date (unix
> >> format), and if it fails put a null or nothing
> >>
> >> I mean, in your overridden putNext, you have you individual columns,
> >> you can try to convert the date in there and put in the output your
> >> unix date.
> >>
> >> sorry if I misunderstood again your problem
> >>
> >> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote:
> >> > Sure, but now I'm just omiting the rows _after_ regex matching.
> >> > What I want to do is to avoid additional filtering by regex and ignore
> >> > invalid rows right after unsuccessful IsoToUnix().
> >> >
> >> > 2011/11/8 pablomar <[EMAIL PROTECTED]>
> >> >
> >> >> can you write something else (a null, for example) in your putNext
> >> >> method for that field when the date is invalid ?
> >> >>
> >> >> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote:
> >> >> > Well, I solved this issue via regex matching, but I wonder if it's
> >> >> > too
> >> >> > costful.
> >> >> > Is there anyway the way to ignore exceptions and move on just by
> >> omiting
> >> >> > the wrong tuples?
> >> >> >
> >> >> > 2011/11/8 Rauan Maemirov <[EMAIL PROTECTED]>
> >> >> >
> >> >> >> Hi, all. I've got custom log (csv delimited by comma) with iso
> >> >> >> dates,
> >> >> >> sometimes log writing lags and I'm having exceptions with wrong
> iso
> >> >> >> date
> >> >> >> format.
> >> >> >> Here's exception: https://gist.github.com/1347406. (Date is the
> last
> >> >> >> "parameter" in the row, and it's incorrectly overwritten at the
> end
> >> by
> >> >> >> another string).
> >> >> >>
> >> >> >> The question is how can I filter out all wrong dates or at least
> >> force
> >> >> pig
> >> >> >> to ignore them instead of failing?
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>