Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> calling pig from a web app


+
soren@...) 2011-01-11, 00:06
+
Charles Gonçalves 2011-01-11, 00:37
+
Jeff Zhang 2011-01-11, 04:39
+
soren@...) 2011-01-11, 21:34
+
Dmitriy Ryaboy 2011-01-11, 21:40
Copy link to this message
-
Re: calling pig from a web app
Also Pig is not thread safe so far, so you can not have multiple threads firing different Pig "queries" in parallel.
Oozie works around this by running pig from a Map task on a slave. That way each Pig script runs in its own process.
Julien

On 1/11/11 1:40 PM, "Dmitriy Ryaboy" <[EMAIL PROTECTED]> wrote:

Soren,
The "real" answer is probably to use Oozie under the covers in order to
handle all kinds of edge conditions w.r.t cluster availability, job
configuration, etc.

If you don't want to deal with Oozie or Azkaban, you could do something like
the following:

1) web app that works with a simple "pig job" model. The pig job model
specifies the script, parameters, status (submitted / running / done /
killed / died), and a few timestamps as needed.

2) a daemon process that monitors the table for new jobs and starts them on
the cluster, updating the table appropriately. You can add all the resource
constraints, access restrictions, etc here.

The more you develop the daemon and the web app (how about monitoring the
Pig job through the new PigStats?...), the more you will realize you are
rebuilding Oozie and start thinking about how to integrate it. But if you
need something to work by the end of the week, a quickly rolled daemon +
rails app is probably faster to set up in the short term.

D

On Tue, Jan 11, 2011 at 1:34 PM, [EMAIL PROTECTED] <[EMAIL PROTECTED]>wrote:

> Thanks Jeff. I am aware of the Java API, I was hoping to hear from people
> who might already be doing this and learn from their own experiences before
> I go down any one particular path.
>
> On Mon, Jan 10, 2011 at 8:39 PM, Jeff Zhang <[EMAIL PROTECTED]> wrote:
>
> > You can use Java API of Pig. Regarding the priority, you can let user
> > choose
> > the priority on web page. And you can use other scheduler rather the
> > default
> > FIFO of hadoop
> >
> > On Tue, Jan 11, 2011 at 8:37 AM, Charles Gonçalves <[EMAIL PROTECTED]
> > >wrote:
> >
> > > I reinforce the interest in this topic.
> > > I'll soon need to create a web interface for my marketers colleagues
> ...
> > >
> > > On Mon, Jan 10, 2011 at 10:06 PM, [EMAIL PROTECTED] <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > I'd be interested to hear people's experience / best practices for
> > > running
> > > > pig scripts on demand from a web app. What do you use as the calling
> > > > mechanism? how to you handle priority / scheduling for ad-hoc or user
> > > > generated tasks?
> > > >
> > > > Best,
> > > > Soren
> > > >
> > > > --
> > > > http://about.me/soren/bio
> > > >
> > >
> > >
> > >
> > > --
> > > *Charles Ferreira Gonçalves *
> > > http://homepages.dcc.ufmg.br/~charles/
> > > UFMG - ICEx - Dcc
> > > Cel.: 55 31 87741485
> > > Tel.:  55 31 34741485
> > > Lab.: 55 31 34095840
> > >
> >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>
>
>
> --
> http://about.me/soren/bio
>

+
soren@...) 2011-01-11, 22:51
+
Alejandro Abdelnur 2011-01-13, 04:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB