Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> calling pig from a web app


Copy link to this message
-
Re: calling pig from a web app
Soren,

Adding to the 'oozie' alternative ...

With Oozie you can can do something like:

$ oozie pig -file SCRIPT

The command line options are aligned with Pig ones (you can do a direct
passthrough of options). You'll get a JOB ID (like it would be a PIG server)
and later you can monitor the progress of the job via commanline, API or
webconsole.

And you don't need to write Oozie workflow.xml

And with Oozie 2.3, about to be released, it becomes even simpler as you
don't have to worry about the PIG JARs (Oozie now supports a sharelib).

Hope this helps.

Thanks.

Alejandro

On Wed, Jan 12, 2011 at 6:51 AM, [EMAIL PROTECTED] <[EMAIL PROTECTED]>wrote:

> Thanks Dmitriy, exactly the information I was looking for.
>
> On Tue, Jan 11, 2011 at 1:40 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> wrote:
>
> > Soren,
> > The "real" answer is probably to use Oozie under the covers in order to
> > handle all kinds of edge conditions w.r.t cluster availability, job
> > configuration, etc.
> >
> > If you don't want to deal with Oozie or Azkaban, you could do something
> > like
> > the following:
> >
> > 1) web app that works with a simple "pig job" model. The pig job model
> > specifies the script, parameters, status (submitted / running / done /
> > killed / died), and a few timestamps as needed.
> >
> > 2) a daemon process that monitors the table for new jobs and starts them
> on
> > the cluster, updating the table appropriately. You can add all the
> resource
> > constraints, access restrictions, etc here.
> >
> > The more you develop the daemon and the web app (how about monitoring the
> > Pig job through the new PigStats?...), the more you will realize you are
> > rebuilding Oozie and start thinking about how to integrate it. But if you
> > need something to work by the end of the week, a quickly rolled daemon +
> > rails app is probably faster to set up in the short term.
> >
> > D
> >
> > On Tue, Jan 11, 2011 at 1:34 PM, [EMAIL PROTECTED] <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Thanks Jeff. I am aware of the Java API, I was hoping to hear from
> people
> > > who might already be doing this and learn from their own experiences
> > before
> > > I go down any one particular path.
> > >
> > > On Mon, Jan 10, 2011 at 8:39 PM, Jeff Zhang <[EMAIL PROTECTED]> wrote:
> > >
> > > > You can use Java API of Pig. Regarding the priority, you can let user
> > > > choose
> > > > the priority on web page. And you can use other scheduler rather the
> > > > default
> > > > FIFO of hadoop
> > > >
> > > > On Tue, Jan 11, 2011 at 8:37 AM, Charles Gonçalves <
> > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > I reinforce the interest in this topic.
> > > > > I'll soon need to create a web interface for my marketers
> colleagues
> > > ...
> > > > >
> > > > > On Mon, Jan 10, 2011 at 10:06 PM, [EMAIL PROTECTED] <
> > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > I'd be interested to hear people's experience / best practices
> for
> > > > > running
> > > > > > pig scripts on demand from a web app. What do you use as the
> > calling
> > > > > > mechanism? how to you handle priority / scheduling for ad-hoc or
> > user
> > > > > > generated tasks?
> > > > > >
> > > > > > Best,
> > > > > > Soren
> > > > > >
> > > > > > --
> > > > > > http://about.me/soren/bio
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *Charles Ferreira Gonçalves *
> > > > > http://homepages.dcc.ufmg.br/~charles/
> > > > > UFMG - ICEx - Dcc
> > > > > Cel.: 55 31 87741485
> > > > > Tel.:  55 31 34741485
> > > > > Lab.: 55 31 34095840
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Jeff Zhang
> > > >
> > >
> > >
> > >
> > > --
> > > http://about.me/soren/bio
> > >
> >
>
>
>
> --
> http://about.me/soren/bio
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB