Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Input and output path


+
Mohit Anchlia 2012-09-10, 23:11
+
Ruslan Al-Fakikh 2012-09-10, 23:17
+
Mohit Anchlia 2012-09-10, 23:29
+
Ruslan Al-Fakikh 2012-09-11, 15:12
+
MiaoMiao 2012-09-13, 02:31
+
Ruslan Al-Fakikh 2012-09-13, 21:04
Copy link to this message
-
Re: Input and output path
You can do something similar to -
https://cwiki.apache.org/PIG/faq.html#FAQ-Q%253AIloaddatafromadirectorywhichcontainsdifferentfile.HowdoIfindoutwherethedatacomesfrom%253F

Get input path from pig and then substitute the values for date, hour etc.
You have to also override getSchema method so that pig gets to see these
fields.

Just beware of -https://issues.apache.org/jira/browse/PIG-2462

Thanks,
Aniket

On Thu, Sep 13, 2012 at 2:04 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>wrote:

> MiaoMiao, Mohit,
>
> If we are talking about embedding Pig into Python, I'd like to add
> that we can also embed Pig into Java using PigServer
> http://wiki.apache.org/pig/EmbeddedPig
>
> MiaoMiao, what's the purpose of embedding here (if we already have
> parameter substitution feature)? I guess Pig embedding is mostly
> suitable in case we want to add IF/ELSE or LOOP functionality
>
> Thanks
>
> On Thu, Sep 13, 2012 at 6:31 AM, MiaoMiao <[EMAIL PROTECTED]> wrote:
> > I wrote a python script to do this
> >
> > import sys
> > yyyymmddhh = sys.argv[1]
> > inputPath = getInputPath(yyyymmddhh) #yyyymmddhh to "YYYY/MM/DD/HH/input"
> > outputPath = getOutputPath(yyyymmddhh) #yyyymmddhh to
> "YYYY/MM/DD/HH/output"
> > pigScript = '''
> > some = load '$input' using PigStorage(',')
> >     as(
> >         id:INT,
> >         value:INT
> >     );
> > final = ..... ;
> > STORE final INTO '$output' using PigStorage(',');
> > '''
> > P = Pig.compile(pigScript)
> > result = P.bind({'input':inputPath, 'output':outputPath}).runSingle()
> > if result.isSuccessful() :
> >     print 'Pig job succeeded'
> > else :
> >     raise 'Pig job failed'
> >
> > Then you can run it with pig
> > pig -x local pig.py 2012091108
> >
> > On Tue, Sep 11, 2012 at 7:11 AM, Mohit Anchlia <[EMAIL PROTECTED]>
> wrote:
> >> Our input path is something like YYYY/MM/DD/HH/input and we like to
> write
> >> to YYYY/MM/DD/HH/output . Is it possible to get the input path as a
> String
> >> and convert it to YYYY/MM/DD/HH/output that I can use in "store into"
> >> clause?
>

--
"...:::Aniket:::... Quetzalco@tl"
+
MiaoMiao 2012-09-17, 05:11