Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Input and output path


Copy link to this message
-
Re: Input and output path
MiaoMiao, Mohit,

If we are talking about embedding Pig into Python, I'd like to add
that we can also embed Pig into Java using PigServer
http://wiki.apache.org/pig/EmbeddedPig

MiaoMiao, what's the purpose of embedding here (if we already have
parameter substitution feature)? I guess Pig embedding is mostly
suitable in case we want to add IF/ELSE or LOOP functionality

Thanks

On Thu, Sep 13, 2012 at 6:31 AM, MiaoMiao <[EMAIL PROTECTED]> wrote:
> I wrote a python script to do this
>
> import sys
> yyyymmddhh = sys.argv[1]
> inputPath = getInputPath(yyyymmddhh) #yyyymmddhh to "YYYY/MM/DD/HH/input"
> outputPath = getOutputPath(yyyymmddhh) #yyyymmddhh to "YYYY/MM/DD/HH/output"
> pigScript = '''
> some = load '$input' using PigStorage(',')
>     as(
>         id:INT,
>         value:INT
>     );
> final = ..... ;
> STORE final INTO '$output' using PigStorage(',');
> '''
> P = Pig.compile(pigScript)
> result = P.bind({'input':inputPath, 'output':outputPath}).runSingle()
> if result.isSuccessful() :
>     print 'Pig job succeeded'
> else :
>     raise 'Pig job failed'
>
> Then you can run it with pig
> pig -x local pig.py 2012091108
>
> On Tue, Sep 11, 2012 at 7:11 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
>> Our input path is something like YYYY/MM/DD/HH/input and we like to write
>> to YYYY/MM/DD/HH/output . Is it possible to get the input path as a String
>> and convert it to YYYY/MM/DD/HH/output that I can use in "store into"
>> clause?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB