Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # user >> meeting notes 10/22/13


Copy link to this message
-
Re: meeting notes 10/22/13

> Here are the notes from todays hangout. Michael, can you copy them into the google doc?
Thanks & done.

Cheers,
Michael

--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/

On 22 Oct 2013, at 17:49, Jason Altekruse <[EMAIL PROTECTED]> wrote:

> Hello All,
>
> Here are the notes from todays hangout. Michael, can you copy them into the
> google doc?
>
> participants: Jacques, Micheal hausenblas, Lisen Mu, Yash Sharma, Jinfeng,
> Jason Altekruse, Harri, Steven Phillips, Timothy Chen, Julien Hyde
>
> New employee at MapR: Jinfeng
>    - couple more in the next month
>
> Jacques:
>    - merged limit
>    - clarify VVs
>        - never access internal state of VV when it is invalid
>    - release notes
>
> Steven:
>    - ordered partitioner
>        - abstract out distributed cache interface
>    - continue to work on spooling to disk
> Jason:
>    -semi-blocking
>        - look at sort and ordered hash partitioner
>
> Yash
>    - name of functions
>        - separate class for operators and functions for more clarity
>            - different operators have their own class files
>
> Lisen
>    - fork of Drill
>        - data pushed form leaves rather than pulled from root
>        - we have been thinking about this same problem
>            - don't want to wait for IO all the time
>            - pre-fetch rather than push
>            - in a join you might get pushed a huge amount of data when you
> aren't ready for it
>            - stream processing
>                - alternative concept around foreman
>                - not quite right for streams
>                - resource allocation
>                    - not as much for resource requirements
>        -HyperLogLog
>            - space saving
>            - acceptable - not precise
>        - data assembly - business logic
>            - approximations will be important to drill
>            - no serious thinking about sampling
>            - certain types of scanners should support sampling
>                - hard with some without reading all data anyway
>                - Hbase might be easier to do a scan
>            - doing it with their own business logic and statistics
>                - hard to generalize
>
> Hari
>    - not much for updates
>    - pick up with amazon ec2 docs
>        - had problem where we need 8 gigs
>        - cannot get it running on free micro instance
>        - got it working removing the direct memory flag in POM
>        - tim - out of memory exception right away
>            - was this with or without changing the option for direct
> memory?
>
> Tim
>    - wir patch in
>    - amp labs big data benchmark
>        - having numbers for performance evaluation
>        - set up on their repo for drill datasets
>        - installing HDFS to all of the nodes
>        - doesn't look to complicated
>    - cannot submit sql in distributed mode because of bad optimizer
>    - recent review board patches
>        - describe code more completely
>        - hard to review without docs
>        - Julien - single powerpoint slide per operator
>        - google doc? like the logical plan doc
>
>
> Ben
>    - code gen portion of merging receiver
>    - no blockers
>        - getting to code review soon
>
> Julian
>    - joined hortonworks
>    - working on optiq
>    - helping hive, but also working on Drill
>    - making optiq everything it can be
>    - splitting JDBC into thin client
>        - thinking about it, no implementation yet
>        - right now pushing sorts down to Mongo
>    - jacques - session next week on JDBC?
>    - roadmap on optiq
>        - commit logs tell some of the story
>        - roadmap would be helpful
>        - will put out call for optiq users like drill
>        - put together feature list for next release(s)
>        - next 6 months, want to be agile, but wants to be more predictable
>        - Jinfeng will be working with optimizer and optiq
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB