Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> compiled pig move output to input


Copy link to this message
-
compiled pig move output to input
I'm trying to run a an embedded Pig script (embeded in Python) where I
need to take the output/result of the script and feed it back into
script as the input. I'm sure there is an easy way to do this but all
the examples seem overly simplistic and are using one column examples.

My input looks like this: networkMap.csv:

NodeH,4,-0.4
NodeH,5,0.2
NodeO,6,0.1
Link,W_1_4,0.2,1,4
Link,W_1_5,-0.3,1,5
Link,W_2_4,0.4,2,4
Link,W_2_5,0.1,2,5
Link,W_3_4,-0.5,3,4
Link,W_3_5,-0.2,3,5
Link,W_4_6,-0.3,4,6
Link,W_5_6,-0.2,5,6
LR,LR,0.9
Target,Target,1

And lets take a super simple example of what I want to do striping out
all of the application logic to just focus on the input/output
problem:

#!/usr/bin/python

from org.apache.pig.scripting import *

P = Pig.compile("""
A = LOAD '$input' using PigStorage(',') AS (type:chararray,
name:chararray, val:double,iName:chararray,jName:chararray);

STORE A INTO '$outFile' USING PigStorage (',');
""")
params = { 'input': 'networkMap.csv'}
for i in range(2):
    outDir = "out_" + str(i + 1)
    inputString = ""
    params["outFile"] = "out_" + str(i + 1)
    bound = P.bind(params)
    stats = bound.runSingle()
    if not stats.isSuccessful():
        raise 'failed'
    params["input"] = stats.result("Output1")

I was hoping that I could just say input = output but that doesn't
work. I've also tried:

input = "";
iter = stats.result("A").iterator()
while iter.hasNext():
    tuple = iter.next()
    input = input + "(" +tuple.toDelimitedString(",") + ")"
params["input"] = input

This did push the output back into the input but then the LOAD
function couldn't read it. since it looked like one big reccord -

A = LOAD '(NodeI,1,1.0,,)(NodeI,2,0.0,,)(NodeI,3,1.0,,)(NodeH,4,-0.4,,)(NodeH,5,0.2,,)(NodeO,6,0.1,,)(Link,W_1_4,0.2,1,4)(Link,W_1_5,-0.3,1,5)(Link,W_2_4,0.4,2,4)(Link,W_2_5,0.1,2,5)(Link,W_3_4,-0.5,3,4)(Link,W_3_5,-0.2,3,5)(Link,W_4_6,-0.3,4,6)(Link,W_5_6,-0.2,5,6)(LR,LR,0.9,,)(Target,Target,1.0,,)'
using PigStorage(',') AS (type:chararray, name:chararray,
val:double,iName:chararray,jName:chararray);

I'm sure I am missing some simple way of doing this.

Thanks for any help you can give me.
-JJ
+
Cheolsoo Park 2012-11-03, 22:16