Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Is there anything in pig that supports external client to stream out a content of alias? a bit like Hive Thrift server...


Copy link to this message
-
Re: Is there anything in pig that supports external client to stream out a content of alias? a bit like Hive Thrift server...
You didn't mention why PigServer.openIterator() won't work for you.
One of its usecase is what you are describing. It will avoid the need
of writing ruby wrapper.

Ashutosh
On Tue, Dec 7, 2010 at 10:26, Jae Lee <[EMAIL PROTECTED]> wrote:
> yeah I came across the openIterator(alias) on PigServer.
>
> basically that's what I like to get (dump of the alias and nothing else) when I execute pig script.
>
> I'm currently writing a ruby wrapper that will use STORE the alias into temporary location in hdfs then do Hadoop file fetch
> any better idea?
>
> J
> On 7 Dec 2010, at 18:16, Ashutosh Chauhan wrote:
>
>> I am not sure if I understood your requirements clearly, but if you
>> are not looking for a pure PigLatin solution and can work through
>> Pig's java api, then you may want to look at PigServer.
>> http://pig.apache.org/docs/r0.7.0/api/org/apache/pig/PigServer.html
>> Something along the following lines:
>>
>> PigServer pig = new PigServer(pc, true);
>> pig.registerQuery("A = load 'mydata'; ");
>> pig.registerQuery("B = filter A by $0 > 10;");
>> Iterator<Tuple> itr = pig.operIterator("B");
>> while(itr.hasNext()){
>>  if ( itr.next().get(0) == 25 ) {
>>    // trigger further processing.
>>  }
>> }
>>
>> Its obviously not directly useful, but conveys the general idea. Hope it helps.
>>
>> Ashutosh
>> On Tue, Dec 7, 2010 at 06:40, Jae Lee <[EMAIL PROTECTED]> wrote:
>>> Hi,
>>>
>>> In our application Hive is used as a database. i.e. a result set from a select query is consumed outside of hadoop cluster.
>>>
>>> The consumption process is not Hadoop friendly as in it is network bound not cpu/disk bound.
>>>
>>> I'm in a process of converting hive query into pig query to see if it reads better.
>>>
>>> What I'm stuck at is finding the content of a specific alias dump, from all the other stuff being logged, to be able to trigger further process.
>>>
>>> STREAM <alias> THROUGH <cmd> seems to be one way to trigger a process, it's just that it seems not suitable for the kind of process we are looking at, because the <cmd> gets run in hadoop cluster.
>>>
>>> any thought?
>>>
>>> J
>>
>
>