Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Working with date converter


Copy link to this message
-
Re: Working with date converter
sorry, I read custom log and I thought you have a custom loader
you can extend PigStorage and do the field replacement in its putNext method

I'll do an example later

On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote:
> Yes, you understand my task right. What is putNext? I'm new to pig, and
> didn't customize udfs.
>
> 2011/11/8 pablomar <[EMAIL PROTECTED]>
>
>> sorry, I didn't understand completely
>>
>> do you want to read a line, if the date is invalid (performing a
>> IsoToUnix directly and not a regex before) you want to skip it ? it
>> that ?
>> if yes, you can replace the field with your converted date (unix
>> format), and if it fails put a null or nothing
>>
>> I mean, in your overridden putNext, you have you individual columns,
>> you can try to convert the date in there and put in the output your
>> unix date.
>>
>> sorry if I misunderstood again your problem
>>
>> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote:
>> > Sure, but now I'm just omiting the rows _after_ regex matching.
>> > What I want to do is to avoid additional filtering by regex and ignore
>> > invalid rows right after unsuccessful IsoToUnix().
>> >
>> > 2011/11/8 pablomar <[EMAIL PROTECTED]>
>> >
>> >> can you write something else (a null, for example) in your putNext
>> >> method for that field when the date is invalid ?
>> >>
>> >> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote:
>> >> > Well, I solved this issue via regex matching, but I wonder if it's
>> >> > too
>> >> > costful.
>> >> > Is there anyway the way to ignore exceptions and move on just by
>> omiting
>> >> > the wrong tuples?
>> >> >
>> >> > 2011/11/8 Rauan Maemirov <[EMAIL PROTECTED]>
>> >> >
>> >> >> Hi, all. I've got custom log (csv delimited by comma) with iso
>> >> >> dates,
>> >> >> sometimes log writing lags and I'm having exceptions with wrong iso
>> >> >> date
>> >> >> format.
>> >> >> Here's exception: https://gist.github.com/1347406. (Date is the last
>> >> >> "parameter" in the row, and it's incorrectly overwritten at the end
>> by
>> >> >> another string).
>> >> >>
>> >> >> The question is how can I filter out all wrong dates or at least
>> force
>> >> pig
>> >> >> to ignore them instead of failing?
>> >> >>
>> >> >
>> >>
>> >
>>
>