Edward Capriolo 2012-05-25, 02:08
Actually, whats more interesting is Shark. I attended one of the meetup
here in Bay Area couple of months back where Chris presented Shark. Had a
follow-up conversation with Matei after that. They have an interesting goal
of making (unaltered) Hive queries run on Spark running on cluster managed
by Mesos. Not just simple hive queries but all the extension point of the
system including udfs, udafs, serde, input/output format etc. My
understanding was that they let Hive parser parse the query. But instead of
generating Map-reduce operator they compile it into spark primitives and
then run it on Spark. I still haven't understood how is it possible to then
seamlessly absorb udfs/serde etc since those tie closely to execution layer
of MR. But, I haven't spend enough time on it.
Its an interesting goal to strive for and promise is to make your queries
go faster since Spark caches data in memory, instead of brute force
scanning of MR and thus hive. In their canonical example hive query which
use to take 24 hrs on hadoop cluster finishes in 45 minutes on spark
On Thu, May 24, 2012 at 7:08 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
> Over the past few months I have seen Spark through around a couple times.
> You know what strikes me as odd. Not once has spark EVER been
> mentioned on this mailing list! (To my knowledge). This is something
> similar to HadoopDB.
> I mean it is open source and all so no one is obligated to tell us
> they are doing a fork or anything like that, but you would think that
> since hive is open to new contributions someone would say, "hey hive
> guys what do you think? Check this out? Isn't it cool. Want to make it
> part of hive?"
> So are we not making the project cool enough? Do we need a new logo :)
> Or move the project to github or something? :)