Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - compiled pig move output to input


Copy link to this message
-
Re: compiled pig move output to input
Cheolsoo Park 2012-11-03, 22:16
Hi Jesse,

>> params["input"] = stats.result("Output1")

By this, you're telling Pig to literally replace every occurrence of
"$input" in the embedded script with a string representation
of stats.result("Output1"), and that's not what you want.

I don't understand the context of your problem, but if what you're looking
for is just feed the output file from the 1st iteration into the 2nd
iteration in your example, please try this:

P = Pig.compile("""
A = LOAD '$input' using PigStorage(',') AS (...);
STORE A INTO '$outFile' USING PigStorage (',');
""")
params = { 'input': 'networkMap.csv'}
for i in range(2):
    outDir = "out_" + str(i + 1)
    inputString = ""
    params["outFile"] = outDir
    bound = P.bind(params)
    stats = bound.runSingle()
    if not stats.isSuccessful():
        raise 'failed'
    params["input"] = outDir + "/part-m-*"

As can be seen, I am setting "$input" for the 2nd iteration to the output
file names of the 1st iteration. PigStorage supports path globbing, so
every file that matches the pattern "out_1/part-m-*" will be loaded in the
2nd iteration.

Alternatively, you could have more than one load/store statements in your
embedded script. For example,

A = LOAD '$input_1' using PigStorage(',') AS (...);
STORE A INTO '$outFile_1' USING PigStorage (',');

B = LOAD '$outFile_1' using PigStorage(',') AS (...);
STORE B INTO '$outFile_2' USING PigStorage (',');

And bind the parameters input_1, outFile_1, and outFile_2 properly.

Hope that this is helpful.

Thanks,
Cheolsoo

On Sat, Nov 3, 2012 at 9:00 AM, Jesse Jackson <[EMAIL PROTECTED]>wrote:

> I'm trying to run a an embedded Pig script (embeded in Python) where I
> need to take the output/result of the script and feed it back into
> script as the input. I'm sure there is an easy way to do this but all
> the examples seem overly simplistic and are using one column examples.
>
> My input looks like this: networkMap.csv:
>
> NodeH,4,-0.4
> NodeH,5,0.2
> NodeO,6,0.1
> Link,W_1_4,0.2,1,4
> Link,W_1_5,-0.3,1,5
> Link,W_2_4,0.4,2,4
> Link,W_2_5,0.1,2,5
> Link,W_3_4,-0.5,3,4
> Link,W_3_5,-0.2,3,5
> Link,W_4_6,-0.3,4,6
> Link,W_5_6,-0.2,5,6
> LR,LR,0.9
> Target,Target,1
>
> And lets take a super simple example of what I want to do striping out
> all of the application logic to just focus on the input/output
> problem:
>
> #!/usr/bin/python
>
> from org.apache.pig.scripting import *
>
> P = Pig.compile("""
> A = LOAD '$input' using PigStorage(',') AS (type:chararray,
> name:chararray, val:double,iName:chararray,jName:chararray);
>
> STORE A INTO '$outFile' USING PigStorage (',');
> """)
> params = { 'input': 'networkMap.csv'}
> for i in range(2):
>     outDir = "out_" + str(i + 1)
>     inputString = ""
>     params["outFile"] = "out_" + str(i + 1)
>     bound = P.bind(params)
>     stats = bound.runSingle()
>     if not stats.isSuccessful():
>         raise 'failed'
>     params["input"] = stats.result("Output1")
>
> I was hoping that I could just say input = output but that doesn't
> work. I've also tried:
>
> input = "";
> iter = stats.result("A").iterator()
> while iter.hasNext():
>     tuple = iter.next()
>     input = input + "(" +tuple.toDelimitedString(",") + ")"
> params["input"] = input
>
> This did push the output back into the input but then the LOAD
> function couldn't read it. since it looked like one big reccord -
>
> A = LOAD
> '(NodeI,1,1.0,,)(NodeI,2,0.0,,)(NodeI,3,1.0,,)(NodeH,4,-0.4,,)(NodeH,5,0.2,,)(NodeO,6,0.1,,)(Link,W_1_4,0.2,1,4)(Link,W_1_5,-0.3,1,5)(Link,W_2_4,0.4,2,4)(Link,W_2_5,0.1,2,5)(Link,W_3_4,-0.5,3,4)(Link,W_3_5,-0.2,3,5)(Link,W_4_6,-0.3,4,6)(Link,W_5_6,-0.2,5,6)(LR,LR,0.9,,)(Target,Target,1.0,,)'
> using PigStorage(',') AS (type:chararray, name:chararray,
> val:double,iName:chararray,jName:chararray);
>
> I'm sure I am missing some simple way of doing this.
>
> Thanks for any help you can give me.
> -JJ
>