Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Is there anything in pig that supports external client to stream out a content of alias? a bit like Hive Thrift server...


+
Jae Lee 2010-12-07, 14:40
Copy link to this message
-
Re: Is there anything in pig that supports external client to stream out a content of alias? a bit like Hive Thrift server...
I am not sure if I understood your requirements clearly, but if you
are not looking for a pure PigLatin solution and can work through
Pig's java api, then you may want to look at PigServer.
http://pig.apache.org/docs/r0.7.0/api/org/apache/pig/PigServer.html
Something along the following lines:

PigServer pig = new PigServer(pc, true);
pig.registerQuery("A = load 'mydata'; ");
pig.registerQuery("B = filter A by $0 > 10;");
Iterator<Tuple> itr = pig.operIterator("B");
while(itr.hasNext()){
  if ( itr.next().get(0) == 25 ) {
    // trigger further processing.
  }
}

Its obviously not directly useful, but conveys the general idea. Hope it helps.

Ashutosh
On Tue, Dec 7, 2010 at 06:40, Jae Lee <[EMAIL PROTECTED]> wrote:
> Hi,
>
> In our application Hive is used as a database. i.e. a result set from a select query is consumed outside of hadoop cluster.
>
> The consumption process is not Hadoop friendly as in it is network bound not cpu/disk bound.
>
> I'm in a process of converting hive query into pig query to see if it reads better.
>
> What I'm stuck at is finding the content of a specific alias dump, from all the other stuff being logged, to be able to trigger further process.
>
> STREAM <alias> THROUGH <cmd> seems to be one way to trigger a process, it's just that it seems not suitable for the kind of process we are looking at, because the <cmd> gets run in hadoop cluster.
>
> any thought?
>
> J
+
Jae Lee 2010-12-07, 18:26
+
Jeff Zhang 2010-12-08, 01:19
+
Jae Lee 2010-12-08, 10:20
+
Ashutosh Chauhan 2010-12-08, 16:41
+
Jae Lee 2010-12-08, 18:33
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB