Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Input and output path


Copy link to this message
-
Re: Input and output path
MiaoMiao, Mohit,

If we are talking about embedding Pig into Python, I'd like to add
that we can also embed Pig into Java using PigServer
http://wiki.apache.org/pig/EmbeddedPig

MiaoMiao, what's the purpose of embedding here (if we already have
parameter substitution feature)? I guess Pig embedding is mostly
suitable in case we want to add IF/ELSE or LOOP functionality

Thanks

On Thu, Sep 13, 2012 at 6:31 AM, MiaoMiao <[EMAIL PROTECTED]> wrote:
> I wrote a python script to do this
>
> import sys
> yyyymmddhh = sys.argv[1]
> inputPath = getInputPath(yyyymmddhh) #yyyymmddhh to "YYYY/MM/DD/HH/input"
> outputPath = getOutputPath(yyyymmddhh) #yyyymmddhh to "YYYY/MM/DD/HH/output"
> pigScript = '''
> some = load '$input' using PigStorage(',')
>     as(
>         id:INT,
>         value:INT
>     );
> final = ..... ;
> STORE final INTO '$output' using PigStorage(',');
> '''
> P = Pig.compile(pigScript)
> result = P.bind({'input':inputPath, 'output':outputPath}).runSingle()
> if result.isSuccessful() :
>     print 'Pig job succeeded'
> else :
>     raise 'Pig job failed'
>
> Then you can run it with pig
> pig -x local pig.py 2012091108
>
> On Tue, Sep 11, 2012 at 7:11 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
>> Our input path is something like YYYY/MM/DD/HH/input and we like to write
>> to YYYY/MM/DD/HH/output . Is it possible to get the input path as a String
>> and convert it to YYYY/MM/DD/HH/output that I can use in "store into"
>> clause?