Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill, mail # user - meeting notes 10/22/13


Copy link to this message
-
Re: meeting notes 10/22/13
Timothy Chen 2013-10-22, 17:04
Thanks Jason! Nice job putting in detailed notes!

Tim
On Tue, Oct 22, 2013 at 10:03 AM, Michael Hausenblas <
[EMAIL PROTECTED]> wrote:

>
> > Here are the notes from todays hangout. Michael, can you copy them into
> the google doc?
>
>
> Thanks & done.
>
> Cheers,
>                 Michael
>
> --
> Michael Hausenblas
> Ireland, Europe
> http://mhausenblas.info/
>
> On 22 Oct 2013, at 17:49, Jason Altekruse <[EMAIL PROTECTED]>
> wrote:
>
> > Hello All,
> >
> > Here are the notes from todays hangout. Michael, can you copy them into
> the
> > google doc?
> >
> > participants: Jacques, Micheal hausenblas, Lisen Mu, Yash Sharma,
> Jinfeng,
> > Jason Altekruse, Harri, Steven Phillips, Timothy Chen, Julien Hyde
> >
> > New employee at MapR: Jinfeng
> >    - couple more in the next month
> >
> > Jacques:
> >    - merged limit
> >    - clarify VVs
> >        - never access internal state of VV when it is invalid
> >    - release notes
> >
> > Steven:
> >    - ordered partitioner
> >        - abstract out distributed cache interface
> >    - continue to work on spooling to disk
> > Jason:
> >    -semi-blocking
> >        - look at sort and ordered hash partitioner
> >
> > Yash
> >    - name of functions
> >        - separate class for operators and functions for more clarity
> >            - different operators have their own class files
> >
> > Lisen
> >    - fork of Drill
> >        - data pushed form leaves rather than pulled from root
> >        - we have been thinking about this same problem
> >            - don't want to wait for IO all the time
> >            - pre-fetch rather than push
> >            - in a join you might get pushed a huge amount of data when
> you
> > aren't ready for it
> >            - stream processing
> >                - alternative concept around foreman
> >                - not quite right for streams
> >                - resource allocation
> >                    - not as much for resource requirements
> >        -HyperLogLog
> >            - space saving
> >            - acceptable - not precise
> >        - data assembly - business logic
> >            - approximations will be important to drill
> >            - no serious thinking about sampling
> >            - certain types of scanners should support sampling
> >                - hard with some without reading all data anyway
> >                - Hbase might be easier to do a scan
> >            - doing it with their own business logic and statistics
> >                - hard to generalize
> >
> > Hari
> >    - not much for updates
> >    - pick up with amazon ec2 docs
> >        - had problem where we need 8 gigs
> >        - cannot get it running on free micro instance
> >        - got it working removing the direct memory flag in POM
> >        - tim - out of memory exception right away
> >            - was this with or without changing the option for direct
> > memory?
> >
> > Tim
> >    - wir patch in
> >    - amp labs big data benchmark
> >        - having numbers for performance evaluation
> >        - set up on their repo for drill datasets
> >        - installing HDFS to all of the nodes
> >        - doesn't look to complicated
> >    - cannot submit sql in distributed mode because of bad optimizer
> >    - recent review board patches
> >        - describe code more completely
> >        - hard to review without docs
> >        - Julien - single powerpoint slide per operator
> >        - google doc? like the logical plan doc
> >
> >
> > Ben
> >    - code gen portion of merging receiver
> >    - no blockers
> >        - getting to code review soon
> >
> > Julian
> >    - joined hortonworks
> >    - working on optiq
> >    - helping hive, but also working on Drill
> >    - making optiq everything it can be
> >    - splitting JDBC into thin client
> >        - thinking about it, no implementation yet
> >        - right now pushing sorts down to Mongo
> >    - jacques - session next week on JDBC?
> >    - roadmap on optiq