|
|
-
Pig developer meeting in February
Olga Natkovich 2011-01-24, 18:14
Hi Guys,
I think it is time for us to have another meeting. Yahoo would be happy to host if this works for everybody. How about Wednesday, 2/9 4-6 pm. Please, let us know if you are planning to attend and if the date/time works for you.
Things that come to mind to discuss and as always feel free to suggest others:
- Error handling proposal - this might be easier to finalize face-to-face - Pig 0.9 plan - Pig Roadmap beyond 0.9 o What do we want to do in Pig.next? o Are we ready for Pig 1.0
Olga
+
Olga Natkovich 2011-01-24, 18:14
-
Re: Pig developer meeting in February
Julien Le Dem 2011-01-26, 22:41
If making Pig Thread safe (i.e.: two threads running a different pig script) is important then we need to change some of the APIs from static singleton access to a dependency injection pattern. In that case, this should probably be done before 1.0 For example: UDFContext should be passed to the UDF after construction (similar to the SevrletContext in Servlet or the way Hadoop passes the context to tasks) Also a clearly separated API that does not depend on the Pig implementation would help. For example UDFContext is in org.apache.pig.impl.util when it would be better in org.apache.pig.api (Or at least an interface defining it)
Julien
On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote:
Hi Guys,
I think it is time for us to have another meeting. Yahoo would be happy to host if this works for everybody. How about Wednesday, 2/9 4-6 pm. Please, let us know if you are planning to attend and if the date/time works for you.
Things that come to mind to discuss and as always feel free to suggest others:
- Error handling proposal - this might be easier to finalize face-to-face - Pig 0.9 plan - Pig Roadmap beyond 0.9 o What do we want to do in Pig.next? o Are we ready for Pig 1.0
Olga
+
Julien Le Dem 2011-01-26, 22:41
-
Re: Pig developer meeting in February
Dmitriy Ryaboy 2011-01-26, 23:55
I may be wrong but I think predicate pushdown is designed for, but not actually implemented in the current LoadPushdown interface (you can only push projections). If I am wrong, that's great.. but if not, that would be an important feature to add, as people are trying to connect Pig to "smart" storage systems like rdbmses, HBase, and Cassandra more and more. I think we only kind of simulate this with partition keys info, which is not always sufficient
D
On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]> wrote:
> If making Pig Thread safe (i.e.: two threads running a different pig > script) is important then we need to change some of the APIs from static > singleton access to a dependency injection pattern. > In that case, this should probably be done before 1.0 > For example: UDFContext should be passed to the UDF after construction > (similar to the SevrletContext in Servlet or the way Hadoop passes the > context to tasks) > Also a clearly separated API that does not depend on the Pig implementation > would help. > For example UDFContext is in org.apache.pig.impl.util when it would be > better in org.apache.pig.api (Or at least an interface defining it) > > Julien > > On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote: > > Hi Guys, > > I think it is time for us to have another meeting. Yahoo would be happy to > host if this works for everybody. How about Wednesday, 2/9 4-6 pm. Please, > let us know if you are planning to attend and if the date/time works for > you. > > Things that come to mind to discuss and as always feel free to suggest > others: > > - Error handling proposal - this might be easier to finalize > face-to-face > - Pig 0.9 plan > - Pig Roadmap beyond 0.9 > o What do we want to do in Pig.next? > o Are we ready for Pig 1.0 > > Olga > >
+
Dmitriy Ryaboy 2011-01-26, 23:55
-
Re: Pig developer meeting in February
Daniel Dai 2011-01-27, 01:59
Are you talking about LoadMetadata.setPartitionFilter? PartitionFilterOptimizer will do that.
Daniel
Dmitriy Ryaboy wrote: > I may be wrong but I think predicate pushdown is designed for, but not > actually implemented in the current LoadPushdown interface (you can only > push projections). If I am wrong, that's great.. but if not, that would be > an important feature to add, as people are trying to connect Pig to "smart" > storage systems like rdbmses, HBase, and Cassandra more and more. I think > we only kind of simulate this with partition keys info, which is not always > sufficient > > D > > On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]> wrote: > > >> If making Pig Thread safe (i.e.: two threads running a different pig >> script) is important then we need to change some of the APIs from static >> singleton access to a dependency injection pattern. >> In that case, this should probably be done before 1.0 >> For example: UDFContext should be passed to the UDF after construction >> (similar to the SevrletContext in Servlet or the way Hadoop passes the >> context to tasks) >> Also a clearly separated API that does not depend on the Pig implementation >> would help. >> For example UDFContext is in org.apache.pig.impl.util when it would be >> better in org.apache.pig.api (Or at least an interface defining it) >> >> Julien >> >> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote: >> >> Hi Guys, >> >> I think it is time for us to have another meeting. Yahoo would be happy to >> host if this works for everybody. How about Wednesday, 2/9 4-6 pm. Please, >> let us know if you are planning to attend and if the date/time works for >> you. >> >> Things that come to mind to discuss and as always feel free to suggest >> others: >> >> - Error handling proposal - this might be easier to finalize >> face-to-face >> - Pig 0.9 plan >> - Pig Roadmap beyond 0.9 >> o What do we want to do in Pig.next? >> o Are we ready for Pig 1.0 >> >> Olga >> >> >>
+
Daniel Dai 2011-01-27, 01:59
-
Re: Pig developer meeting in February
Dmitriy Ryaboy 2011-01-27, 02:04
Right, we do partition filtering, but not true predicate pushdown.
On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]> wrote:
> Are you talking about LoadMetadata.setPartitionFilter? > PartitionFilterOptimizer will do that. > > Daniel > > > Dmitriy Ryaboy wrote: > >> I may be wrong but I think predicate pushdown is designed for, but not >> actually implemented in the current LoadPushdown interface (you can only >> push projections). If I am wrong, that's great.. but if not, that would be >> an important feature to add, as people are trying to connect Pig to >> "smart" >> storage systems like rdbmses, HBase, and Cassandra more and more. I think >> we only kind of simulate this with partition keys info, which is not >> always >> sufficient >> >> D >> >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]> >> wrote: >> >> >> >>> If making Pig Thread safe (i.e.: two threads running a different pig >>> script) is important then we need to change some of the APIs from static >>> singleton access to a dependency injection pattern. >>> In that case, this should probably be done before 1.0 >>> For example: UDFContext should be passed to the UDF after construction >>> (similar to the SevrletContext in Servlet or the way Hadoop passes the >>> context to tasks) >>> Also a clearly separated API that does not depend on the Pig >>> implementation >>> would help. >>> For example UDFContext is in org.apache.pig.impl.util when it would be >>> better in org.apache.pig.api (Or at least an interface defining it) >>> >>> Julien >>> >>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote: >>> >>> Hi Guys, >>> >>> I think it is time for us to have another meeting. Yahoo would be happy >>> to >>> host if this works for everybody. How about Wednesday, 2/9 4-6 pm. >>> Please, >>> let us know if you are planning to attend and if the date/time works for >>> you. >>> >>> Things that come to mind to discuss and as always feel free to suggest >>> others: >>> >>> - Error handling proposal - this might be easier to finalize >>> face-to-face >>> - Pig 0.9 plan >>> - Pig Roadmap beyond 0.9 >>> o What do we want to do in Pig.next? >>> o Are we ready for Pig 1.0 >>> >>> Olga >>> >>> >>> >>> >> >
+
Dmitriy Ryaboy 2011-01-27, 02:04
-
RE: Pig developer meeting in February
Olga Natkovich 2011-01-27, 23:17
While there is a lively discussion on this thread, I have not actually gotten any responses to having the meeting with exception of 1 person :).
Please, let me know by the end of the week if you are planning to attend. If we don't get at least a few more responses I suggest we postpone the meeting.
Thanks,
Olga
-----Original Message----- From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] Sent: Wednesday, January 26, 2011 6:04 PM To: [EMAIL PROTECTED] Subject: Re: Pig developer meeting in February
Right, we do partition filtering, but not true predicate pushdown.
On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]> wrote:
> Are you talking about LoadMetadata.setPartitionFilter? > PartitionFilterOptimizer will do that. > > Daniel > > > Dmitriy Ryaboy wrote: > >> I may be wrong but I think predicate pushdown is designed for, but not >> actually implemented in the current LoadPushdown interface (you can only >> push projections). If I am wrong, that's great.. but if not, that would be >> an important feature to add, as people are trying to connect Pig to >> "smart" >> storage systems like rdbmses, HBase, and Cassandra more and more. I think >> we only kind of simulate this with partition keys info, which is not >> always >> sufficient >> >> D >> >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]> >> wrote: >> >> >> >>> If making Pig Thread safe (i.e.: two threads running a different pig >>> script) is important then we need to change some of the APIs from static >>> singleton access to a dependency injection pattern. >>> In that case, this should probably be done before 1.0 >>> For example: UDFContext should be passed to the UDF after construction >>> (similar to the SevrletContext in Servlet or the way Hadoop passes the >>> context to tasks) >>> Also a clearly separated API that does not depend on the Pig >>> implementation >>> would help. >>> For example UDFContext is in org.apache.pig.impl.util when it would be >>> better in org.apache.pig.api (Or at least an interface defining it) >>> >>> Julien >>> >>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote: >>> >>> Hi Guys, >>> >>> I think it is time for us to have another meeting. Yahoo would be happy >>> to >>> host if this works for everybody. How about Wednesday, 2/9 4-6 pm. >>> Please, >>> let us know if you are planning to attend and if the date/time works for >>> you. >>> >>> Things that come to mind to discuss and as always feel free to suggest >>> others: >>> >>> - Error handling proposal - this might be easier to finalize >>> face-to-face >>> - Pig 0.9 plan >>> - Pig Roadmap beyond 0.9 >>> o What do we want to do in Pig.next? >>> o Are we ready for Pig 1.0 >>> >>> Olga >>> >>> >>> >>> >> >
+
Olga Natkovich 2011-01-27, 23:17
-
Re: Pig developer meeting in February
Dmitriy Ryaboy 2011-01-28, 00:09
Ok yeah I'll come :).
On Thu, Jan 27, 2011 at 3:17 PM, Olga Natkovich <[EMAIL PROTECTED]> wrote:
> While there is a lively discussion on this thread, I have not actually > gotten any responses to having the meeting with exception of 1 person :). > > Please, let me know by the end of the week if you are planning to attend. > If we don't get at least a few more responses I suggest we postpone the > meeting. > > Thanks, > > Olga > > -----Original Message----- > From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, January 26, 2011 6:04 PM > To: [EMAIL PROTECTED] > Subject: Re: Pig developer meeting in February > > Right, we do partition filtering, but not true predicate pushdown. > > On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]> > wrote: > > > Are you talking about LoadMetadata.setPartitionFilter? > > PartitionFilterOptimizer will do that. > > > > Daniel > > > > > > Dmitriy Ryaboy wrote: > > > >> I may be wrong but I think predicate pushdown is designed for, but not > >> actually implemented in the current LoadPushdown interface (you can only > >> push projections). If I am wrong, that's great.. but if not, that would > be > >> an important feature to add, as people are trying to connect Pig to > >> "smart" > >> storage systems like rdbmses, HBase, and Cassandra more and more. I > think > >> we only kind of simulate this with partition keys info, which is not > >> always > >> sufficient > >> > >> D > >> > >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]> > >> wrote: > >> > >> > >> > >>> If making Pig Thread safe (i.e.: two threads running a different pig > >>> script) is important then we need to change some of the APIs from > static > >>> singleton access to a dependency injection pattern. > >>> In that case, this should probably be done before 1.0 > >>> For example: UDFContext should be passed to the UDF after construction > >>> (similar to the SevrletContext in Servlet or the way Hadoop passes the > >>> context to tasks) > >>> Also a clearly separated API that does not depend on the Pig > >>> implementation > >>> would help. > >>> For example UDFContext is in org.apache.pig.impl.util when it would be > >>> better in org.apache.pig.api (Or at least an interface defining it) > >>> > >>> Julien > >>> > >>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote: > >>> > >>> Hi Guys, > >>> > >>> I think it is time for us to have another meeting. Yahoo would be happy > >>> to > >>> host if this works for everybody. How about Wednesday, 2/9 4-6 pm. > >>> Please, > >>> let us know if you are planning to attend and if the date/time works > for > >>> you. > >>> > >>> Things that come to mind to discuss and as always feel free to suggest > >>> others: > >>> > >>> - Error handling proposal - this might be easier to finalize > >>> face-to-face > >>> - Pig 0.9 plan > >>> - Pig Roadmap beyond 0.9 > >>> o What do we want to do in Pig.next? > >>> o Are we ready for Pig 1.0 > >>> > >>> Olga > >>> > >>> > >>> > >>> > >> > > >
+
Dmitriy Ryaboy 2011-01-28, 00:09
-
Re: Pig developer meeting in February
Julien Le Dem 2011-01-28, 01:21
Me too. Julien On 1/27/11 4:09 PM, "Dmitriy Ryaboy" <[EMAIL PROTECTED]> wrote:
Ok yeah I'll come :).
On Thu, Jan 27, 2011 at 3:17 PM, Olga Natkovich <[EMAIL PROTECTED]> wrote:
> While there is a lively discussion on this thread, I have not actually > gotten any responses to having the meeting with exception of 1 person :). > > Please, let me know by the end of the week if you are planning to attend. > If we don't get at least a few more responses I suggest we postpone the > meeting. > > Thanks, > > Olga > > -----Original Message----- > From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, January 26, 2011 6:04 PM > To: [EMAIL PROTECTED] > Subject: Re: Pig developer meeting in February > > Right, we do partition filtering, but not true predicate pushdown. > > On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]> > wrote: > > > Are you talking about LoadMetadata.setPartitionFilter? > > PartitionFilterOptimizer will do that. > > > > Daniel > > > > > > Dmitriy Ryaboy wrote: > > > >> I may be wrong but I think predicate pushdown is designed for, but not > >> actually implemented in the current LoadPushdown interface (you can only > >> push projections). If I am wrong, that's great.. but if not, that would > be > >> an important feature to add, as people are trying to connect Pig to > >> "smart" > >> storage systems like rdbmses, HBase, and Cassandra more and more. I > think > >> we only kind of simulate this with partition keys info, which is not > >> always > >> sufficient > >> > >> D > >> > >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]> > >> wrote: > >> > >> > >> > >>> If making Pig Thread safe (i.e.: two threads running a different pig > >>> script) is important then we need to change some of the APIs from > static > >>> singleton access to a dependency injection pattern. > >>> In that case, this should probably be done before 1.0 > >>> For example: UDFContext should be passed to the UDF after construction > >>> (similar to the SevrletContext in Servlet or the way Hadoop passes the > >>> context to tasks) > >>> Also a clearly separated API that does not depend on the Pig > >>> implementation > >>> would help. > >>> For example UDFContext is in org.apache.pig.impl.util when it would be > >>> better in org.apache.pig.api (Or at least an interface defining it) > >>> > >>> Julien > >>> > >>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote: > >>> > >>> Hi Guys, > >>> > >>> I think it is time for us to have another meeting. Yahoo would be happy > >>> to > >>> host if this works for everybody. How about Wednesday, 2/9 4-6 pm. > >>> Please, > >>> let us know if you are planning to attend and if the date/time works > for > >>> you. > >>> > >>> Things that come to mind to discuss and as always feel free to suggest > >>> others: > >>> > >>> - Error handling proposal - this might be easier to finalize > >>> face-to-face > >>> - Pig 0.9 plan > >>> - Pig Roadmap beyond 0.9 > >>> o What do we want to do in Pig.next? > >>> o Are we ready for Pig 1.0 > >>> > >>> Olga > >>> > >>> > >>> > >>> > >> > > >
+
Julien Le Dem 2011-01-28, 01:21
-
Re: Pig developer meeting in February
Ashutosh Chauhan 2011-01-28, 10:35
> Are you saying that as long as one claims every column as a partition, all filters will be pushed > down?
Exactly. Though javadoc are heavily worded for partition pruning, since that was the primary use case at that time for predicate pushdown. But you will get all the filter expressions if you claim all the columns are partition columns. Partition columns have no special semantics in Pig apart then this.
> Will the filters also be applied to the data the loader returns, even if the loader accepts the > expression?
I think filter will be deleted from logical plan if it is pushed up. So, it wont be applied in pipeline later on. Daniel can confirm if thats the case with new logical plan or not?
Ashutosh
On Thu, Jan 27, 2011 at 17:21, Julien Le Dem <[EMAIL PROTECTED]> wrote: > Me too. > Julien > > > On 1/27/11 4:09 PM, "Dmitriy Ryaboy" <[EMAIL PROTECTED]> wrote: > > Ok yeah I'll come :). > > > > On Thu, Jan 27, 2011 at 3:17 PM, Olga Natkovich <[EMAIL PROTECTED]> wrote: > >> While there is a lively discussion on this thread, I have not actually >> gotten any responses to having the meeting with exception of 1 person :). >> >> Please, let me know by the end of the week if you are planning to attend. >> If we don't get at least a few more responses I suggest we postpone the >> meeting. >> >> Thanks, >> >> Olga >> >> -----Original Message----- >> From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] >> Sent: Wednesday, January 26, 2011 6:04 PM >> To: [EMAIL PROTECTED] >> Subject: Re: Pig developer meeting in February >> >> Right, we do partition filtering, but not true predicate pushdown. >> >> On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]> >> wrote: >> >> > Are you talking about LoadMetadata.setPartitionFilter? >> > PartitionFilterOptimizer will do that. >> > >> > Daniel >> > >> > >> > Dmitriy Ryaboy wrote: >> > >> >> I may be wrong but I think predicate pushdown is designed for, but not >> >> actually implemented in the current LoadPushdown interface (you can only >> >> push projections). If I am wrong, that's great.. but if not, that would >> be >> >> an important feature to add, as people are trying to connect Pig to >> >> "smart" >> >> storage systems like rdbmses, HBase, and Cassandra more and more. I >> think >> >> we only kind of simulate this with partition keys info, which is not >> >> always >> >> sufficient >> >> >> >> D >> >> >> >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]> >> >> wrote: >> >> >> >> >> >> >> >>> If making Pig Thread safe (i.e.: two threads running a different pig >> >>> script) is important then we need to change some of the APIs from >> static >> >>> singleton access to a dependency injection pattern. >> >>> In that case, this should probably be done before 1.0 >> >>> For example: UDFContext should be passed to the UDF after construction >> >>> (similar to the SevrletContext in Servlet or the way Hadoop passes the >> >>> context to tasks) >> >>> Also a clearly separated API that does not depend on the Pig >> >>> implementation >> >>> would help. >> >>> For example UDFContext is in org.apache.pig.impl.util when it would be >> >>> better in org.apache.pig.api (Or at least an interface defining it) >> >>> >> >>> Julien >> >>> >> >>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote: >> >>> >> >>> Hi Guys, >> >>> >> >>> I think it is time for us to have another meeting. Yahoo would be happy >> >>> to >> >>> host if this works for everybody. How about Wednesday, 2/9 4-6 pm. >> >>> Please, >> >>> let us know if you are planning to attend and if the date/time works >> for >> >>> you. >> >>> >> >>> Things that come to mind to discuss and as always feel free to suggest >> >>> others: >> >>> >> >>> - Error handling proposal - this might be easier to finalize >> >>> face-to-face >> >>> - Pig 0.9 plan >> >>> - Pig Roadmap beyond 0.9 >> >>> o What do we want to do in Pig.next? >> >>> o Are we ready for Pig 1.0
+
Ashutosh Chauhan 2011-01-28, 10:35
-
RE: Pig developer meeting in February
Olga Natkovich 2011-01-28, 20:58
I believe we have critical mass so the meeting is on!
If you have not responded yet but planning to attend, please, let me know.
Thanks,
Olga
-----Original Message----- From: Julien Le Dem [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 27, 2011 5:21 PM To: [EMAIL PROTECTED] Subject: Re: Pig developer meeting in February
Me too. Julien On 1/27/11 4:09 PM, "Dmitriy Ryaboy" <[EMAIL PROTECTED]> wrote:
Ok yeah I'll come :).
On Thu, Jan 27, 2011 at 3:17 PM, Olga Natkovich <[EMAIL PROTECTED]> wrote:
> While there is a lively discussion on this thread, I have not actually > gotten any responses to having the meeting with exception of 1 person :). > > Please, let me know by the end of the week if you are planning to attend. > If we don't get at least a few more responses I suggest we postpone the > meeting. > > Thanks, > > Olga > > -----Original Message----- > From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, January 26, 2011 6:04 PM > To: [EMAIL PROTECTED] > Subject: Re: Pig developer meeting in February > > Right, we do partition filtering, but not true predicate pushdown. > > On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]> > wrote: > > > Are you talking about LoadMetadata.setPartitionFilter? > > PartitionFilterOptimizer will do that. > > > > Daniel > > > > > > Dmitriy Ryaboy wrote: > > > >> I may be wrong but I think predicate pushdown is designed for, but not > >> actually implemented in the current LoadPushdown interface (you can only > >> push projections). If I am wrong, that's great.. but if not, that would > be > >> an important feature to add, as people are trying to connect Pig to > >> "smart" > >> storage systems like rdbmses, HBase, and Cassandra more and more. I > think > >> we only kind of simulate this with partition keys info, which is not > >> always > >> sufficient > >> > >> D > >> > >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]> > >> wrote: > >> > >> > >> > >>> If making Pig Thread safe (i.e.: two threads running a different pig > >>> script) is important then we need to change some of the APIs from > static > >>> singleton access to a dependency injection pattern. > >>> In that case, this should probably be done before 1.0 > >>> For example: UDFContext should be passed to the UDF after construction > >>> (similar to the SevrletContext in Servlet or the way Hadoop passes the > >>> context to tasks) > >>> Also a clearly separated API that does not depend on the Pig > >>> implementation > >>> would help. > >>> For example UDFContext is in org.apache.pig.impl.util when it would be > >>> better in org.apache.pig.api (Or at least an interface defining it) > >>> > >>> Julien > >>> > >>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote: > >>> > >>> Hi Guys, > >>> > >>> I think it is time for us to have another meeting. Yahoo would be happy > >>> to > >>> host if this works for everybody. How about Wednesday, 2/9 4-6 pm. > >>> Please, > >>> let us know if you are planning to attend and if the date/time works > for > >>> you. > >>> > >>> Things that come to mind to discuss and as always feel free to suggest > >>> others: > >>> > >>> - Error handling proposal - this might be easier to finalize > >>> face-to-face > >>> - Pig 0.9 plan > >>> - Pig Roadmap beyond 0.9 > >>> o What do we want to do in Pig.next? > >>> o Are we ready for Pig 1.0 > >>> > >>> Olga > >>> > >>> > >>> > >>> > >> > > >
+
Olga Natkovich 2011-01-28, 20:58
-
RE: Pig developer meeting in February
Santhosh Srinivasan 2011-01-28, 23:35
I am planning to attend.
-----Original Message----- From: Olga Natkovich [mailto:[EMAIL PROTECTED]] Sent: Friday, January 28, 2011 12:58 PM To: [EMAIL PROTECTED] Subject: RE: Pig developer meeting in February
I believe we have critical mass so the meeting is on!
If you have not responded yet but planning to attend, please, let me know.
Thanks,
Olga
-----Original Message----- From: Julien Le Dem [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 27, 2011 5:21 PM To: [EMAIL PROTECTED] Subject: Re: Pig developer meeting in February
Me too. Julien On 1/27/11 4:09 PM, "Dmitriy Ryaboy" <[EMAIL PROTECTED]> wrote:
Ok yeah I'll come :).
On Thu, Jan 27, 2011 at 3:17 PM, Olga Natkovich <[EMAIL PROTECTED]> wrote:
> While there is a lively discussion on this thread, I have not actually > gotten any responses to having the meeting with exception of 1 person :). > > Please, let me know by the end of the week if you are planning to attend. > If we don't get at least a few more responses I suggest we postpone > the meeting. > > Thanks, > > Olga > > -----Original Message----- > From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, January 26, 2011 6:04 PM > To: [EMAIL PROTECTED] > Subject: Re: Pig developer meeting in February > > Right, we do partition filtering, but not true predicate pushdown. > > On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]> > wrote: > > > Are you talking about LoadMetadata.setPartitionFilter? > > PartitionFilterOptimizer will do that. > > > > Daniel > > > > > > Dmitriy Ryaboy wrote: > > > >> I may be wrong but I think predicate pushdown is designed for, but > >> not actually implemented in the current LoadPushdown interface (you > >> can only push projections). If I am wrong, that's great.. but if > >> not, that would > be > >> an important feature to add, as people are trying to connect Pig to > >> "smart" > >> storage systems like rdbmses, HBase, and Cassandra more and more. > >> I > think > >> we only kind of simulate this with partition keys info, which is > >> not always sufficient > >> > >> D > >> > >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem > >> <[EMAIL PROTECTED]> > >> wrote: > >> > >> > >> > >>> If making Pig Thread safe (i.e.: two threads running a different > >>> pig > >>> script) is important then we need to change some of the APIs from > static > >>> singleton access to a dependency injection pattern. > >>> In that case, this should probably be done before 1.0 For example: > >>> UDFContext should be passed to the UDF after construction (similar > >>> to the SevrletContext in Servlet or the way Hadoop passes the > >>> context to tasks) Also a clearly separated API that does not > >>> depend on the Pig implementation would help. > >>> For example UDFContext is in org.apache.pig.impl.util when it > >>> would be better in org.apache.pig.api (Or at least an interface > >>> defining it) > >>> > >>> Julien > >>> > >>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote: > >>> > >>> Hi Guys, > >>> > >>> I think it is time for us to have another meeting. Yahoo would be > >>> happy to host if this works for everybody. How about Wednesday, > >>> 2/9 4-6 pm. > >>> Please, > >>> let us know if you are planning to attend and if the date/time > >>> works > for > >>> you. > >>> > >>> Things that come to mind to discuss and as always feel free to > >>> suggest > >>> others: > >>> > >>> - Error handling proposal - this might be easier to finalize > >>> face-to-face > >>> - Pig 0.9 plan > >>> - Pig Roadmap beyond 0.9 > >>> o What do we want to do in Pig.next? > >>> o Are we ready for Pig 1.0 > >>> > >>> Olga > >>> > >>> > >>> > >>> > >> > > >
+
Santhosh Srinivasan 2011-01-28, 23:35
-
Re: Pig developer meeting in February
Romain Rigaux 2011-02-04, 18:44
Me too, I am interested in coming,
Romain
On Fri, Jan 28, 2011 at 3:35 PM, Santhosh Srinivasan <[EMAIL PROTECTED]>wrote:
> I am planning to attend. > > -----Original Message----- > From: Olga Natkovich [mailto:[EMAIL PROTECTED]] > Sent: Friday, January 28, 2011 12:58 PM > To: [EMAIL PROTECTED] > Subject: RE: Pig developer meeting in February > > I believe we have critical mass so the meeting is on! > > If you have not responded yet but planning to attend, please, let me know. > > Thanks, > > Olga > > -----Original Message----- > From: Julien Le Dem [mailto:[EMAIL PROTECTED]] > Sent: Thursday, January 27, 2011 5:21 PM > To: [EMAIL PROTECTED] > Subject: Re: Pig developer meeting in February > > Me too. > Julien > > > On 1/27/11 4:09 PM, "Dmitriy Ryaboy" <[EMAIL PROTECTED]> wrote: > > Ok yeah I'll come :). > > > > On Thu, Jan 27, 2011 at 3:17 PM, Olga Natkovich <[EMAIL PROTECTED]> > wrote: > > > While there is a lively discussion on this thread, I have not actually > > gotten any responses to having the meeting with exception of 1 person :). > > > > Please, let me know by the end of the week if you are planning to attend. > > If we don't get at least a few more responses I suggest we postpone > > the meeting. > > > > Thanks, > > > > Olga > > > > -----Original Message----- > > From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] > > Sent: Wednesday, January 26, 2011 6:04 PM > > To: [EMAIL PROTECTED] > > Subject: Re: Pig developer meeting in February > > > > Right, we do partition filtering, but not true predicate pushdown. > > > > On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]> > > wrote: > > > > > Are you talking about LoadMetadata.setPartitionFilter? > > > PartitionFilterOptimizer will do that. > > > > > > Daniel > > > > > > > > > Dmitriy Ryaboy wrote: > > > > > >> I may be wrong but I think predicate pushdown is designed for, but > > >> not actually implemented in the current LoadPushdown interface (you > > >> can only push projections). If I am wrong, that's great.. but if > > >> not, that would > > be > > >> an important feature to add, as people are trying to connect Pig to > > >> "smart" > > >> storage systems like rdbmses, HBase, and Cassandra more and more. > > >> I > > think > > >> we only kind of simulate this with partition keys info, which is > > >> not always sufficient > > >> > > >> D > > >> > > >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem > > >> <[EMAIL PROTECTED]> > > >> wrote: > > >> > > >> > > >> > > >>> If making Pig Thread safe (i.e.: two threads running a different > > >>> pig > > >>> script) is important then we need to change some of the APIs from > > static > > >>> singleton access to a dependency injection pattern. > > >>> In that case, this should probably be done before 1.0 For example: > > >>> UDFContext should be passed to the UDF after construction (similar > > >>> to the SevrletContext in Servlet or the way Hadoop passes the > > >>> context to tasks) Also a clearly separated API that does not > > >>> depend on the Pig implementation would help. > > >>> For example UDFContext is in org.apache.pig.impl.util when it > > >>> would be better in org.apache.pig.api (Or at least an interface > > >>> defining it) > > >>> > > >>> Julien > > >>> > > >>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote: > > >>> > > >>> Hi Guys, > > >>> > > >>> I think it is time for us to have another meeting. Yahoo would be > > >>> happy to host if this works for everybody. How about Wednesday, > > >>> 2/9 4-6 pm. > > >>> Please, > > >>> let us know if you are planning to attend and if the date/time > > >>> works > > for > > >>> you. > > >>> > > >>> Things that come to mind to discuss and as always feel free to > > >>> suggest > > >>> others: > > >>> > > >>> - Error handling proposal - this might be easier to finalize > > >>> face-to-face > > >>> - Pig 0.9 plan > > >>> - Pig Roadmap beyond 0.9 > > >>> o What do we want to do in Pig.next?
+
Romain Rigaux 2011-02-04, 18:44
-
Re: Pig developer meeting in February
Ashutosh Chauhan 2011-01-27, 18:02
What do you mean by true predicate pushdown? We hand over the full filter expression in that method to loader. That I guess is sufficient info to push more processing at storage layer e.g. to do range queries in Hbase. Pig doesn't have any more information about filters then that to push, unless you want full logical plan.
Ashutosh On Wed, Jan 26, 2011 at 18:04, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > Right, we do partition filtering, but not true predicate pushdown. > > On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]> wrote: > >> Are you talking about LoadMetadata.setPartitionFilter? >> PartitionFilterOptimizer will do that. >> >> Daniel >> >> >> Dmitriy Ryaboy wrote: >> >>> I may be wrong but I think predicate pushdown is designed for, but not >>> actually implemented in the current LoadPushdown interface (you can only >>> push projections). If I am wrong, that's great.. but if not, that would be >>> an important feature to add, as people are trying to connect Pig to >>> "smart" >>> storage systems like rdbmses, HBase, and Cassandra more and more. I think >>> we only kind of simulate this with partition keys info, which is not >>> always >>> sufficient >>> >>> D >>> >>> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]> >>> wrote: >>> >>> >>> >>>> If making Pig Thread safe (i.e.: two threads running a different pig >>>> script) is important then we need to change some of the APIs from static >>>> singleton access to a dependency injection pattern. >>>> In that case, this should probably be done before 1.0 >>>> For example: UDFContext should be passed to the UDF after construction >>>> (similar to the SevrletContext in Servlet or the way Hadoop passes the >>>> context to tasks) >>>> Also a clearly separated API that does not depend on the Pig >>>> implementation >>>> would help. >>>> For example UDFContext is in org.apache.pig.impl.util when it would be >>>> better in org.apache.pig.api (Or at least an interface defining it) >>>> >>>> Julien >>>> >>>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote: >>>> >>>> Hi Guys, >>>> >>>> I think it is time for us to have another meeting. Yahoo would be happy >>>> to >>>> host if this works for everybody. How about Wednesday, 2/9 4-6 pm. >>>> Please, >>>> let us know if you are planning to attend and if the date/time works for >>>> you. >>>> >>>> Things that come to mind to discuss and as always feel free to suggest >>>> others: >>>> >>>> - Error handling proposal - this might be easier to finalize >>>> face-to-face >>>> - Pig 0.9 plan >>>> - Pig Roadmap beyond 0.9 >>>> o What do we want to do in Pig.next? >>>> o Are we ready for Pig 1.0 >>>> >>>> Olga >>>> >>>> >>>> >>>> >>> >> >
+
Ashutosh Chauhan 2011-01-27, 18:02
-
Re: Pig developer meeting in February
Dmitriy Ryaboy 2011-01-28, 00:15
Ashutosh, where do we do that? I thought we did, too, but didn't find it last time I looked. LoadPushDown has this:
/**
* Set of possible operations that Pig can push down to a loader.
*/
enum OperatorSet {PROJECTION}; There is also this in LoadMetadata, but it is pretty explicit in the comments about this being partition-specific. Are you saying that as long as one claims every column as a partition, all filters will be pushed down? Will the filters also be applied to the data the loader returns, even if the loader accepts the expression? That would be useful for loaders that have ability to apply probabilistic filters, for example.
/**
* Find what columns are partition keys for this input.
* @param location Location as returned by
* {@link LoadFunc#relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)}
* @param job The {@link Job} object - this should be used only to obtain
* cluster properties through {@link Job#getConfiguration()} and not to set/query
* any runtime job information.
* @return array of field names of the partition keys. Implementations
* should return null to indicate that there are no partition keys
* @throws IOException if an exception occurs while retrieving partition keys
*/
String[] getPartitionKeys(String location, Job job)
throws IOException; /**
* Set the filter for partitioning. It is assumed that this filter
* will only contain references to fields given as partition keys in
* getPartitionKeys. So if the implementation returns null in
* {@link #getPartitionKeys(String, Job)}, then this method is not
* called by Pig runtime. This method is also not called by the Pig runtime
* if there are no partition filter conditions.
* @param partitionFilter that describes filter for partitioning
* @throws IOException if the filter is not compatible with the storage
* mechanism or contains non-partition fields.
*/
void setPartitionFilter(Expression partitionFilter) throws IOException;
On Thu, Jan 27, 2011 at 10:02 AM, Ashutosh Chauhan <[EMAIL PROTECTED]>wrote:
> What do you mean by true predicate pushdown? We hand over the full > filter expression in that method to loader. That I guess is > sufficient info to push more processing at storage layer e.g. to do > range queries in Hbase. Pig doesn't have any more information about > filters then that to push, unless you want full logical plan. > > Ashutosh > On Wed, Jan 26, 2011 at 18:04, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > > Right, we do partition filtering, but not true predicate pushdown. > > > > On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]> > wrote: > > > >> Are you talking about LoadMetadata.setPartitionFilter? > >> PartitionFilterOptimizer will do that. > >> > >> Daniel > >> > >> > >> Dmitriy Ryaboy wrote: > >> > >>> I may be wrong but I think predicate pushdown is designed for, but not > >>> actually implemented in the current LoadPushdown interface (you can > only > >>> push projections). If I am wrong, that's great.. but if not, that would > be > >>> an important feature to add, as people are trying to connect Pig to > >>> "smart" > >>> storage systems like rdbmses, HBase, and Cassandra more and more. I > think > >>> we only kind of simulate this with partition keys info, which is not > >>> always > >>> sufficient > >>> > >>> D > >>> > >>> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]> > >>> wrote: > >>> > >>> > >>> > >>>> If making Pig Thread safe (i.e.: two threads running a different pig > >>>> script) is important then we need to change some of the APIs from > static > >>>> singleton access to a dependency injection pattern. > >>>> In that case, this should probably be done before 1.0 > >>>> For example: UDFContext should be passed to the UDF after construction > >>>> (similar to the SevrletContext in Servlet or the way Hadoop passes the
+
Dmitriy Ryaboy 2011-01-28, 00:15
|
|