Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Working with date converter


+
Rauan Maemirov 2011-11-08, 10:18
+
Rauan Maemirov 2011-11-08, 10:49
+
pablomar 2011-11-08, 11:38
+
Rauan Maemirov 2011-11-08, 11:43
+
pablomar 2011-11-08, 12:08
+
Rauan Maemirov 2011-11-08, 12:25
+
pablomar 2011-11-08, 12:31
Copy link to this message
-
Re: Working with date converter
sorry for the delay !!!

it must be better option, but I wrote a simple loader, extending PigStorage
(I re-used/took a lot of code from PigStorage, specially its parse/split
method)
you need to complete the method 'process' to take the field/fields you need
to convert your date and then set the right field ( 0? )

to compile it, you have to put in your classpath pig-core.jar and
hadoop-code.jar
something like:

javac -cp /usr/lib/pig/pig-core.jar:/usr/lib/hadoop/hadoop-core.jar
myPackage/MyLoader.java

any doubt, just let me know

On Tue, Nov 8, 2011 at 7:31 AM, pablomar <[EMAIL PROTECTED]>wrote:

> sorry, I read custom log and I thought you have a custom loader
> you can extend PigStorage and do the field replacement in its putNext
> method
>
> I'll do an example later
>
> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote:
> > Yes, you understand my task right. What is putNext? I'm new to pig, and
> > didn't customize udfs.
> >
> > 2011/11/8 pablomar <[EMAIL PROTECTED]>
> >
> >> sorry, I didn't understand completely
> >>
> >> do you want to read a line, if the date is invalid (performing a
> >> IsoToUnix directly and not a regex before) you want to skip it ? it
> >> that ?
> >> if yes, you can replace the field with your converted date (unix
> >> format), and if it fails put a null or nothing
> >>
> >> I mean, in your overridden putNext, you have you individual columns,
> >> you can try to convert the date in there and put in the output your
> >> unix date.
> >>
> >> sorry if I misunderstood again your problem
> >>
> >> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote:
> >> > Sure, but now I'm just omiting the rows _after_ regex matching.
> >> > What I want to do is to avoid additional filtering by regex and ignore
> >> > invalid rows right after unsuccessful IsoToUnix().
> >> >
> >> > 2011/11/8 pablomar <[EMAIL PROTECTED]>
> >> >
> >> >> can you write something else (a null, for example) in your putNext
> >> >> method for that field when the date is invalid ?
> >> >>
> >> >> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote:
> >> >> > Well, I solved this issue via regex matching, but I wonder if it's
> >> >> > too
> >> >> > costful.
> >> >> > Is there anyway the way to ignore exceptions and move on just by
> >> omiting
> >> >> > the wrong tuples?
> >> >> >
> >> >> > 2011/11/8 Rauan Maemirov <[EMAIL PROTECTED]>
> >> >> >
> >> >> >> Hi, all. I've got custom log (csv delimited by comma) with iso
> >> >> >> dates,
> >> >> >> sometimes log writing lags and I'm having exceptions with wrong
> iso
> >> >> >> date
> >> >> >> format.
> >> >> >> Here's exception: https://gist.github.com/1347406. (Date is the
> last
> >> >> >> "parameter" in the row, and it's incorrectly overwritten at the
> end
> >> by
> >> >> >> another string).
> >> >> >>
> >> >> >> The question is how can I filter out all wrong dates or at least
> >> force
> >> >> pig
> >> >> >> to ignore them instead of failing?
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB