Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Bigtop >> mail # dev >> Re: How do you run Spark jobs?


Copy link to this message
-
Re: How do you run Spark jobs?
On Fri, Aug 2, 2013 at 9:50 PM, Patrick Wendell <[EMAIL PROTECTED]> wrote:
> Hey All,
>
> I'm working on SPARK-800 [1]. The goal is to document a best practice or
> recommended way of bundling and running Spark jobs. We have a quickstart
> guide for writing a standlone job, but it doesn't cover how to deal with
> packaging up your dependencies and setting the correct environment
> variables required to submit a full job to a cluster. This can be a
> confusing process for beginners - it would be good to extend the guide to
> cover this.
>
> First though I wanted to sample this list and see how people tend to run
> Spark jobs inside their org's. Knowing any of the following would be
> helpful:
>
> - Do you create an uber jar with all of your job (and Spark)'s recursive
> dependencies?
> - Do you try to use sbt run or maven exec with some way to pass the correct
> environment variables?
> - Do people use a modified version of spark's own `run` script?
> - Do you have some other way of submitting jobs?
>
> Any notes would be helpful in compiling this!

Now that Spark has been integrated into Bigtop:
    https://issues.apache.org/jira/browse/BIGTOP-715
it may make sense to tackle some of those issues from a
distribution perspective. Bigtop has a luxury of defining an
entire distribution (you always know what versions of Hadoop
and its ecosystem projects your're dealing with). It also
provides helper functionality for a lot of common things (like
finding JAVA_HOME, plugging into the underlying
OS capabilities, etc.).

I guess all I'm saying is that you guys should consider Bigtop
as an integration platform for making Spark easier to use.

Feel free to fork off this thread to dev@bigtop (CCed) if you
think this is an idea worth exploring.

Thanks,
Roman.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB