Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Working with date converter


+
Rauan Maemirov 2011-11-08, 10:18
+
Rauan Maemirov 2011-11-08, 10:49
+
pablomar 2011-11-08, 11:38
+
Rauan Maemirov 2011-11-08, 11:43
+
pablomar 2011-11-08, 12:08
+
Rauan Maemirov 2011-11-08, 12:25
Copy link to this message
-
Re: Working with date converter
sorry, I read custom log and I thought you have a custom loader
you can extend PigStorage and do the field replacement in its putNext method

I'll do an example later

On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote:
> Yes, you understand my task right. What is putNext? I'm new to pig, and
> didn't customize udfs.
>
> 2011/11/8 pablomar <[EMAIL PROTECTED]>
>
>> sorry, I didn't understand completely
>>
>> do you want to read a line, if the date is invalid (performing a
>> IsoToUnix directly and not a regex before) you want to skip it ? it
>> that ?
>> if yes, you can replace the field with your converted date (unix
>> format), and if it fails put a null or nothing
>>
>> I mean, in your overridden putNext, you have you individual columns,
>> you can try to convert the date in there and put in the output your
>> unix date.
>>
>> sorry if I misunderstood again your problem
>>
>> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote:
>> > Sure, but now I'm just omiting the rows _after_ regex matching.
>> > What I want to do is to avoid additional filtering by regex and ignore
>> > invalid rows right after unsuccessful IsoToUnix().
>> >
>> > 2011/11/8 pablomar <[EMAIL PROTECTED]>
>> >
>> >> can you write something else (a null, for example) in your putNext
>> >> method for that field when the date is invalid ?
>> >>
>> >> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote:
>> >> > Well, I solved this issue via regex matching, but I wonder if it's
>> >> > too
>> >> > costful.
>> >> > Is there anyway the way to ignore exceptions and move on just by
>> omiting
>> >> > the wrong tuples?
>> >> >
>> >> > 2011/11/8 Rauan Maemirov <[EMAIL PROTECTED]>
>> >> >
>> >> >> Hi, all. I've got custom log (csv delimited by comma) with iso
>> >> >> dates,
>> >> >> sometimes log writing lags and I'm having exceptions with wrong iso
>> >> >> date
>> >> >> format.
>> >> >> Here's exception: https://gist.github.com/1347406. (Date is the last
>> >> >> "parameter" in the row, and it's incorrectly overwritten at the end
>> by
>> >> >> another string).
>> >> >>
>> >> >> The question is how can I filter out all wrong dates or at least
>> force
>> >> pig
>> >> >> to ignore them instead of failing?
>> >> >>
>> >> >
>> >>
>> >
>>
>
+
pablomar 2011-11-10, 02:41
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB