Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Presto -> SQL engines for HDFS ?


Copy link to this message
-
Presto -> SQL engines for HDFS ?
Hi there,

the appearance of Presto stimulates a thought that a list and overview of engines of this sort would be jolly useful for general understanding and management of FUD in the enterprise....

I think that these engines are purposed for responsiveness for queries that can be written to generate relatively little interconnect traffic on a cluster.

I know of :

Drill (doh)
Impala
Presto
Gryphon (uses H-BASE?)
HAWK
BlinkDB (slightly different, uses bootstrapping for aggregates)

I believe that Stinger should be thought of as something else - optimisations of an engine purposed for large scale queries.

Is this view close to correct?

Can anyone elucidate on the statements about Impala doing record materialisation where as other engines do vectorization? Is vectorization query rewriting for parallelism?

How do the folk in the Drill project see the plethora of efforts? Does anyone have a view as to why there are so many engines appearing?

Best

Simon

----            
Dr. Simon Thompson
Chief Researcher, Customer Experience.
BT Research.
BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath.
IP5 3RE

Note :

This email contains BT information, which may be privileged or confidential. It's meant only for the individual(s) or entity named above. If you're not the intended recipient, note that disclosing, copying, distributing or using this information is prohibited. If you've received this email in error, please let me know immediately on the email address above. Thank you.
We monitor our email system, and may record your emails.
British Telecommunications plc
Registered office: 81 Newgate Street London EC1A 7AJ
Registered in England no: 1800000