|
|
+
Rauan Maemirov 2011-11-08, 10:18
+
Rauan Maemirov 2011-11-08, 10:49
+
pablomar 2011-11-08, 11:38
+
Rauan Maemirov 2011-11-08, 11:43
+
pablomar 2011-11-08, 12:08
+
Rauan Maemirov 2011-11-08, 12:25
+
pablomar 2011-11-08, 12:31
-
Re: Working with date converterpablomar 2011-11-10, 02:41
sorry for the delay !!!
it must be better option, but I wrote a simple loader, extending PigStorage (I re-used/took a lot of code from PigStorage, specially its parse/split method) you need to complete the method 'process' to take the field/fields you need to convert your date and then set the right field ( 0? ) to compile it, you have to put in your classpath pig-core.jar and hadoop-code.jar something like: javac -cp /usr/lib/pig/pig-core.jar:/usr/lib/hadoop/hadoop-core.jar myPackage/MyLoader.java any doubt, just let me know On Tue, Nov 8, 2011 at 7:31 AM, pablomar <[EMAIL PROTECTED]>wrote: > sorry, I read custom log and I thought you have a custom loader > you can extend PigStorage and do the field replacement in its putNext > method > > I'll do an example later > > On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote: > > Yes, you understand my task right. What is putNext? I'm new to pig, and > > didn't customize udfs. > > > > 2011/11/8 pablomar <[EMAIL PROTECTED]> > > > >> sorry, I didn't understand completely > >> > >> do you want to read a line, if the date is invalid (performing a > >> IsoToUnix directly and not a regex before) you want to skip it ? it > >> that ? > >> if yes, you can replace the field with your converted date (unix > >> format), and if it fails put a null or nothing > >> > >> I mean, in your overridden putNext, you have you individual columns, > >> you can try to convert the date in there and put in the output your > >> unix date. > >> > >> sorry if I misunderstood again your problem > >> > >> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote: > >> > Sure, but now I'm just omiting the rows _after_ regex matching. > >> > What I want to do is to avoid additional filtering by regex and ignore > >> > invalid rows right after unsuccessful IsoToUnix(). > >> > > >> > 2011/11/8 pablomar <[EMAIL PROTECTED]> > >> > > >> >> can you write something else (a null, for example) in your putNext > >> >> method for that field when the date is invalid ? > >> >> > >> >> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote: > >> >> > Well, I solved this issue via regex matching, but I wonder if it's > >> >> > too > >> >> > costful. > >> >> > Is there anyway the way to ignore exceptions and move on just by > >> omiting > >> >> > the wrong tuples? > >> >> > > >> >> > 2011/11/8 Rauan Maemirov <[EMAIL PROTECTED]> > >> >> > > >> >> >> Hi, all. I've got custom log (csv delimited by comma) with iso > >> >> >> dates, > >> >> >> sometimes log writing lags and I'm having exceptions with wrong > iso > >> >> >> date > >> >> >> format. > >> >> >> Here's exception: https://gist.github.com/1347406. (Date is the > last > >> >> >> "parameter" in the row, and it's incorrectly overwritten at the > end > >> by > >> >> >> another string). > >> >> >> > >> >> >> The question is how can I filter out all wrong dates or at least > >> force > >> >> pig > >> >> >> to ignore them instead of failing? > >> >> >> > >> >> > > >> >> > >> > > >> > > > |