|
Rauan Maemirov
2011-11-08, 10:18
Rauan Maemirov
2011-11-08, 10:49
pablomar
2011-11-08, 11:38
Rauan Maemirov
2011-11-08, 11:43
pablomar
2011-11-08, 12:08
Rauan Maemirov
2011-11-08, 12:25
pablomar
2011-11-08, 12:31
pablomar
2011-11-10, 02:41
|
-
Working with date converterRauan Maemirov 2011-11-08, 10:18
Hi, all. I've got custom log (csv delimited by comma) with iso dates,
sometimes log writing lags and I'm having exceptions with wrong iso date format. Here's exception: https://gist.github.com/1347406. (Date is the last "parameter" in the row, and it's incorrectly overwritten at the end by another string). The question is how can I filter out all wrong dates or at least force pig to ignore them instead of failing?
-
Re: Working with date converterRauan Maemirov 2011-11-08, 10:49
Well, I solved this issue via regex matching, but I wonder if it's too
costful. Is there anyway the way to ignore exceptions and move on just by omiting the wrong tuples? 2011/11/8 Rauan Maemirov <[EMAIL PROTECTED]> > Hi, all. I've got custom log (csv delimited by comma) with iso dates, > sometimes log writing lags and I'm having exceptions with wrong iso date > format. > Here's exception: https://gist.github.com/1347406. (Date is the last > "parameter" in the row, and it's incorrectly overwritten at the end by > another string). > > The question is how can I filter out all wrong dates or at least force pig > to ignore them instead of failing? >
-
Re: Working with date converterpablomar 2011-11-08, 11:38
can you write something else (a null, for example) in your putNext
method for that field when the date is invalid ? On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote: > Well, I solved this issue via regex matching, but I wonder if it's too > costful. > Is there anyway the way to ignore exceptions and move on just by omiting > the wrong tuples? > > 2011/11/8 Rauan Maemirov <[EMAIL PROTECTED]> > >> Hi, all. I've got custom log (csv delimited by comma) with iso dates, >> sometimes log writing lags and I'm having exceptions with wrong iso date >> format. >> Here's exception: https://gist.github.com/1347406. (Date is the last >> "parameter" in the row, and it's incorrectly overwritten at the end by >> another string). >> >> The question is how can I filter out all wrong dates or at least force pig >> to ignore them instead of failing? >> >
-
Re: Working with date converterRauan Maemirov 2011-11-08, 11:43
Sure, but now I'm just omiting the rows _after_ regex matching.
What I want to do is to avoid additional filtering by regex and ignore invalid rows right after unsuccessful IsoToUnix(). 2011/11/8 pablomar <[EMAIL PROTECTED]> > can you write something else (a null, for example) in your putNext > method for that field when the date is invalid ? > > On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote: > > Well, I solved this issue via regex matching, but I wonder if it's too > > costful. > > Is there anyway the way to ignore exceptions and move on just by omiting > > the wrong tuples? > > > > 2011/11/8 Rauan Maemirov <[EMAIL PROTECTED]> > > > >> Hi, all. I've got custom log (csv delimited by comma) with iso dates, > >> sometimes log writing lags and I'm having exceptions with wrong iso date > >> format. > >> Here's exception: https://gist.github.com/1347406. (Date is the last > >> "parameter" in the row, and it's incorrectly overwritten at the end by > >> another string). > >> > >> The question is how can I filter out all wrong dates or at least force > pig > >> to ignore them instead of failing? > >> > > >
-
Re: Working with date converterpablomar 2011-11-08, 12:08
sorry, I didn't understand completely
do you want to read a line, if the date is invalid (performing a IsoToUnix directly and not a regex before) you want to skip it ? it that ? if yes, you can replace the field with your converted date (unix format), and if it fails put a null or nothing I mean, in your overridden putNext, you have you individual columns, you can try to convert the date in there and put in the output your unix date. sorry if I misunderstood again your problem On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote: > Sure, but now I'm just omiting the rows _after_ regex matching. > What I want to do is to avoid additional filtering by regex and ignore > invalid rows right after unsuccessful IsoToUnix(). > > 2011/11/8 pablomar <[EMAIL PROTECTED]> > >> can you write something else (a null, for example) in your putNext >> method for that field when the date is invalid ? >> >> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote: >> > Well, I solved this issue via regex matching, but I wonder if it's too >> > costful. >> > Is there anyway the way to ignore exceptions and move on just by omiting >> > the wrong tuples? >> > >> > 2011/11/8 Rauan Maemirov <[EMAIL PROTECTED]> >> > >> >> Hi, all. I've got custom log (csv delimited by comma) with iso dates, >> >> sometimes log writing lags and I'm having exceptions with wrong iso >> >> date >> >> format. >> >> Here's exception: https://gist.github.com/1347406. (Date is the last >> >> "parameter" in the row, and it's incorrectly overwritten at the end by >> >> another string). >> >> >> >> The question is how can I filter out all wrong dates or at least force >> pig >> >> to ignore them instead of failing? >> >> >> > >> >
-
Re: Working with date converterRauan Maemirov 2011-11-08, 12:25
Yes, you understand my task right. What is putNext? I'm new to pig, and
didn't customize udfs. 2011/11/8 pablomar <[EMAIL PROTECTED]> > sorry, I didn't understand completely > > do you want to read a line, if the date is invalid (performing a > IsoToUnix directly and not a regex before) you want to skip it ? it > that ? > if yes, you can replace the field with your converted date (unix > format), and if it fails put a null or nothing > > I mean, in your overridden putNext, you have you individual columns, > you can try to convert the date in there and put in the output your > unix date. > > sorry if I misunderstood again your problem > > On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote: > > Sure, but now I'm just omiting the rows _after_ regex matching. > > What I want to do is to avoid additional filtering by regex and ignore > > invalid rows right after unsuccessful IsoToUnix(). > > > > 2011/11/8 pablomar <[EMAIL PROTECTED]> > > > >> can you write something else (a null, for example) in your putNext > >> method for that field when the date is invalid ? > >> > >> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote: > >> > Well, I solved this issue via regex matching, but I wonder if it's too > >> > costful. > >> > Is there anyway the way to ignore exceptions and move on just by > omiting > >> > the wrong tuples? > >> > > >> > 2011/11/8 Rauan Maemirov <[EMAIL PROTECTED]> > >> > > >> >> Hi, all. I've got custom log (csv delimited by comma) with iso dates, > >> >> sometimes log writing lags and I'm having exceptions with wrong iso > >> >> date > >> >> format. > >> >> Here's exception: https://gist.github.com/1347406. (Date is the last > >> >> "parameter" in the row, and it's incorrectly overwritten at the end > by > >> >> another string). > >> >> > >> >> The question is how can I filter out all wrong dates or at least > force > >> pig > >> >> to ignore them instead of failing? > >> >> > >> > > >> > > >
-
Re: Working with date converterpablomar 2011-11-08, 12:31
sorry, I read custom log and I thought you have a custom loader
you can extend PigStorage and do the field replacement in its putNext method I'll do an example later On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote: > Yes, you understand my task right. What is putNext? I'm new to pig, and > didn't customize udfs. > > 2011/11/8 pablomar <[EMAIL PROTECTED]> > >> sorry, I didn't understand completely >> >> do you want to read a line, if the date is invalid (performing a >> IsoToUnix directly and not a regex before) you want to skip it ? it >> that ? >> if yes, you can replace the field with your converted date (unix >> format), and if it fails put a null or nothing >> >> I mean, in your overridden putNext, you have you individual columns, >> you can try to convert the date in there and put in the output your >> unix date. >> >> sorry if I misunderstood again your problem >> >> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote: >> > Sure, but now I'm just omiting the rows _after_ regex matching. >> > What I want to do is to avoid additional filtering by regex and ignore >> > invalid rows right after unsuccessful IsoToUnix(). >> > >> > 2011/11/8 pablomar <[EMAIL PROTECTED]> >> > >> >> can you write something else (a null, for example) in your putNext >> >> method for that field when the date is invalid ? >> >> >> >> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote: >> >> > Well, I solved this issue via regex matching, but I wonder if it's >> >> > too >> >> > costful. >> >> > Is there anyway the way to ignore exceptions and move on just by >> omiting >> >> > the wrong tuples? >> >> > >> >> > 2011/11/8 Rauan Maemirov <[EMAIL PROTECTED]> >> >> > >> >> >> Hi, all. I've got custom log (csv delimited by comma) with iso >> >> >> dates, >> >> >> sometimes log writing lags and I'm having exceptions with wrong iso >> >> >> date >> >> >> format. >> >> >> Here's exception: https://gist.github.com/1347406. (Date is the last >> >> >> "parameter" in the row, and it's incorrectly overwritten at the end >> by >> >> >> another string). >> >> >> >> >> >> The question is how can I filter out all wrong dates or at least >> force >> >> pig >> >> >> to ignore them instead of failing? >> >> >> >> >> > >> >> >> > >> >
-
Re: Working with date converterpablomar 2011-11-10, 02:41
sorry for the delay !!!
it must be better option, but I wrote a simple loader, extending PigStorage (I re-used/took a lot of code from PigStorage, specially its parse/split method) you need to complete the method 'process' to take the field/fields you need to convert your date and then set the right field ( 0? ) to compile it, you have to put in your classpath pig-core.jar and hadoop-code.jar something like: javac -cp /usr/lib/pig/pig-core.jar:/usr/lib/hadoop/hadoop-core.jar myPackage/MyLoader.java any doubt, just let me know On Tue, Nov 8, 2011 at 7:31 AM, pablomar <[EMAIL PROTECTED]>wrote: > sorry, I read custom log and I thought you have a custom loader > you can extend PigStorage and do the field replacement in its putNext > method > > I'll do an example later > > On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote: > > Yes, you understand my task right. What is putNext? I'm new to pig, and > > didn't customize udfs. > > > > 2011/11/8 pablomar <[EMAIL PROTECTED]> > > > >> sorry, I didn't understand completely > >> > >> do you want to read a line, if the date is invalid (performing a > >> IsoToUnix directly and not a regex before) you want to skip it ? it > >> that ? > >> if yes, you can replace the field with your converted date (unix > >> format), and if it fails put a null or nothing > >> > >> I mean, in your overridden putNext, you have you individual columns, > >> you can try to convert the date in there and put in the output your > >> unix date. > >> > >> sorry if I misunderstood again your problem > >> > >> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote: > >> > Sure, but now I'm just omiting the rows _after_ regex matching. > >> > What I want to do is to avoid additional filtering by regex and ignore > >> > invalid rows right after unsuccessful IsoToUnix(). > >> > > >> > 2011/11/8 pablomar <[EMAIL PROTECTED]> > >> > > >> >> can you write something else (a null, for example) in your putNext > >> >> method for that field when the date is invalid ? > >> >> > >> >> On 11/8/11, Rauan Maemirov <[EMAIL PROTECTED]> wrote: > >> >> > Well, I solved this issue via regex matching, but I wonder if it's > >> >> > too > >> >> > costful. > >> >> > Is there anyway the way to ignore exceptions and move on just by > >> omiting > >> >> > the wrong tuples? > >> >> > > >> >> > 2011/11/8 Rauan Maemirov <[EMAIL PROTECTED]> > >> >> > > >> >> >> Hi, all. I've got custom log (csv delimited by comma) with iso > >> >> >> dates, > >> >> >> sometimes log writing lags and I'm having exceptions with wrong > iso > >> >> >> date > >> >> >> format. > >> >> >> Here's exception: https://gist.github.com/1347406. (Date is the > last > >> >> >> "parameter" in the row, and it's incorrectly overwritten at the > end > >> by > >> >> >> another string). > >> >> >> > >> >> >> The question is how can I filter out all wrong dates or at least > >> force > >> >> pig > >> >> >> to ignore them instead of failing? > >> >> >> > >> >> > > >> >> > >> > > >> > > > |