Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Working with date converter


Copy link to this message
-
Re: Working with date converter
sorry, I didn't understand completely

do you want to read a line, if the date is invalid (performing a
IsoToUnix directly and not a regex before) you want to skip it ? it
that ?
if yes, you can replace the field with your converted date (unix
format), and if it fails put a null or nothing

I mean, in your overridden putNext, you have you individual columns,
you can try to convert the date in there and put in the output your
unix date.

sorry if I misunderstood again your problem

On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote:
> Sure, but now I'm just omiting the rows _after_ regex matching.
> What I want to do is to avoid additional filtering by regex and ignore
> invalid rows right after unsuccessful IsoToUnix().
>
> 2011/11/8 pablomar <[EMAIL PROTECTED]>
>
>> can you write something else (a null, for example) in your putNext
>> method for that field when the date is invalid ?
>>
>> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote:
>> > Well, I solved this issue via regex matching, but I wonder if it's too
>> > costful.
>> > Is there anyway the way to ignore exceptions and move on just by omiting
>> > the wrong tuples?
>> >
>> > 2011/11/8 Rauan Maemirov <[EMAIL PROTECTED]>
>> >
>> >> Hi, all. I've got custom log (csv delimited by comma) with iso dates,
>> >> sometimes log writing lags and I'm having exceptions with wrong iso
>> >> date
>> >> format.
>> >> Here's exception: https://gist.github.com/1347406. (Date is the last
>> >> "parameter" in the row, and it's incorrectly overwritten at the end by
>> >> another string).
>> >>
>> >> The question is how can I filter out all wrong dates or at least force
>> pig
>> >> to ignore them instead of failing?
>> >>
>> >
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB