|
deepak.n85@...
2011-01-10, 11:09
Jonathan Coveney
2011-01-10, 15:01
Alan Gates
2011-01-10, 16:18
deepak.n85@...
2011-01-10, 16:25
Dmitriy Ryaboy
2011-01-10, 16:44
Dmitriy Ryaboy
2011-01-10, 16:57
Olga Natkovich
2011-01-10, 17:54
deepak.n85@...
2011-01-11, 09:03
Richard Ding
2011-01-11, 18:17
deepak.n85@...
2011-01-12, 10:24
김영우
2011-01-12, 10:57
deepak.n85@...
2011-01-12, 11:55
김영우
2011-01-13, 03:11
deepak.n85@...
2011-01-13, 06:24
|
-
Iterative MapReduce with PIGdeepak.n85@... 2011-01-10, 11:09
Hi,
I need to implement an application that is iterative in nature. At the end of each iteration, I need to take the result and provide it as an input for the next iteration. Embedding PIG statements in a Java Program looks like one way to do it. But I prefer using Python for programming. How can I do this? Thanks, Deepak Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
-
Re: Iterative MapReduce with PIGJonathan Coveney 2011-01-10, 15:01
What kind of iteration? Basically, if you embedded this in Java, would you
be making repeated pig calls, or would you be constructing a pig script, or what? Basically, a better explanation of what you want to do might help :) 2011/1/10 <[EMAIL PROTECTED]> > Hi, > > I need to implement an application that is iterative in nature. At the end > of each iteration, I need to take the result and provide it as an input for > the next iteration. > > Embedding PIG statements in a Java Program looks like one way to do it. > > But I prefer using Python for programming. How can I do this? > > Thanks, > Deepak > > Please do not print this email unless it is absolutely necessary. > > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain proprietary, confidential or privileged information. If you are not > the intended recipient, you should not disseminate, distribute or copy this > e-mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. > > WARNING: Computer viruses can be transmitted via email. The recipient > should check this email and any attachments for the presence of viruses. The > company accepts no liability for any damage caused by any virus transmitted > by this email. > > www.wipro.com >
-
Re: Iterative MapReduce with PIGAlan Gates 2011-01-10, 16:18
This is one of our major initiatives for 0.9. See http://wiki.apache.org/pig/TuringCompletePig
and https://issues.apache.org/jira/browse/PIG-1479. But until that's ready you'll have to use Java or piglet as recommended by Dmitriy. Alan. On Jan 10, 2011, at 3:09 AM, [EMAIL PROTECTED] wrote: > Hi, > > I need to implement an application that is iterative in nature. At > the end of each iteration, I need to take the result and provide it > as an input for the next iteration. > > Embedding PIG statements in a Java Program looks like one way to do > it. > > But I prefer using Python for programming. How can I do this? > > Thanks, > Deepak > > Please do not print this email unless it is absolutely necessary. > > The information contained in this electronic message and any > attachments to this message are intended for the exclusive use of > the addressee(s) and may contain proprietary, confidential or > privileged information. If you are not the intended recipient, you > should not disseminate, distribute or copy this e-mail. Please > notify the sender immediately and destroy all copies of this message > and any attachments. > > WARNING: Computer viruses can be transmitted via email. The > recipient should check this email and any attachments for the > presence of viruses. The company accepts no liability for any damage > caused by any virus transmitted by this email. > > www.wipro.com
-
RE: Iterative MapReduce with PIGdeepak.n85@... 2011-01-10, 16:25
Thanks All,
To start with, I wish to implement a simple k-means algorithm using Apache PIG. Just by using Python Streaming, there doesn't look like an easy way to do this. Dmitriy: Jython is stuck in 2.5, and I kind of like Python 3 :) Also, Jython will force me to write code in a certain way, that it interacts with Hadoop, java style. I will try out Piglet. Allan: That sounds very interesting. So two features I am looking forward to - Embedding PIG Statements in Scripting Languages, and support for Iterative MapReduce. Regards, Deepak -----Original Message----- From: Alan Gates [mailto:[EMAIL PROTECTED]] Sent: Monday, January 10, 2011 9:49 PM To: [EMAIL PROTECTED] Subject: Re: Iterative MapReduce with PIG This is one of our major initiatives for 0.9. See http://wiki.apache.org/pig/TuringCompletePig and https://issues.apache.org/jira/browse/PIG-1479. But until that's ready you'll have to use Java or piglet as recommended by Dmitriy. Alan. On Jan 10, 2011, at 3:09 AM, [EMAIL PROTECTED] wrote: > Hi, > > I need to implement an application that is iterative in nature. At the > end of each iteration, I need to take the result and provide it as an > input for the next iteration. > > Embedding PIG statements in a Java Program looks like one way to do > it. > > But I prefer using Python for programming. How can I do this? > > Thanks, > Deepak > > Please do not print this email unless it is absolutely necessary. > > The information contained in this electronic message and any > attachments to this message are intended for the exclusive use of the > addressee(s) and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you should not > disseminate, distribute or copy this e-mail. Please notify the sender > immediately and destroy all copies of this message and any > attachments. > > WARNING: Computer viruses can be transmitted via email. The recipient > should check this email and any attachments for the presence of > viruses. The company accepts no liability for any damage caused by any > virus transmitted by this email. > > www.wipro.com Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
-
RE: Iterative MapReduce with PIGDmitriy Ryaboy 2011-01-10, 16:44
Use Jython?
If you like ruby, you can try piglet, the pig ruby dsl. It's in github. -----Original Message----- From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: 1/10/2011 3:09 AM Subject: Iterative MapReduce with PIG Hi, I need to implement an application that is iterative in nature. At the end of each iteration, I need to take the result and provide it as an input for the next iteration. Embedding PIG statements in a Java Program looks like one way to do it. But I prefer using Python for programming. How can I do this? Thanks, Deepak Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of vir [truncated by sender]
-
Re: Iterative MapReduce with PIGDmitriy Ryaboy 2011-01-10, 16:57
Sorry about that. My phone seems to be in a send loop.
On Mon, Jan 10, 2011 at 8:44 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > Use Jython? > If you like ruby, you can try piglet, the pig ruby dsl. It's in github. > > -----Original Message----- > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Sent: 1/10/2011 3:09 AM > Subject: Iterative MapReduce with PIG > > Hi, > > I need to implement an application that is iterative in nature. At the end > of each iteration, I need to take the result and provide it as an input for > the next iteration. > > Embedding PIG statements in a Java Program looks like one way to do it. > > But I prefer using Python for programming. How can I do this? > > Thanks, > Deepak > > Please do not print this email unless it is absolutely necessary. > > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain proprietary, confidential or privileged information. If you are not > the intended recipient, you should not disseminate, distribute or copy this > e-mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. > > WARNING: Computer viruses can be transmitted via email. The recipient > should check this email and any attachments for the presence of vir > [truncated by sender] >
-
RE: Iterative MapReduce with PIGOlga Natkovich 2011-01-10, 17:54
The initial implementation has been checked into the trunk last Friday. If you feel adventurous, you can give it a try :).
Olga -----Original Message----- From: Alan Gates [mailto:[EMAIL PROTECTED]] Sent: Monday, January 10, 2011 8:19 AM To: [EMAIL PROTECTED] Subject: Re: Iterative MapReduce with PIG This is one of our major initiatives for 0.9. See http://wiki.apache.org/pig/TuringCompletePig and https://issues.apache.org/jira/browse/PIG-1479. But until that's ready you'll have to use Java or piglet as recommended by Dmitriy. Alan. On Jan 10, 2011, at 3:09 AM, [EMAIL PROTECTED] wrote: > Hi, > > I need to implement an application that is iterative in nature. At > the end of each iteration, I need to take the result and provide it > as an input for the next iteration. > > Embedding PIG statements in a Java Program looks like one way to do > it. > > But I prefer using Python for programming. How can I do this? > > Thanks, > Deepak > > Please do not print this email unless it is absolutely necessary. > > The information contained in this electronic message and any > attachments to this message are intended for the exclusive use of > the addressee(s) and may contain proprietary, confidential or > privileged information. If you are not the intended recipient, you > should not disseminate, distribute or copy this e-mail. Please > notify the sender immediately and destroy all copies of this message > and any attachments. > > WARNING: Computer viruses can be transmitted via email. The > recipient should check this email and any attachments for the > presence of viruses. The company accepts no liability for any damage > caused by any virus transmitted by this email. > > www.wipro.com
-
RE: Iterative MapReduce with PIGdeepak.n85@... 2011-01-11, 09:03
Hi,
I downloaded Pig-0.9.0-SNAPSHOT.tar.gz and set it up. I am trying to run this: -------- #!/usr/bin/python # Name - embed_pig.py p = Pig.compile(""" records = LOAD 'path/to/data' AS (input_line:chararray); DESCRIBE records; """) for i in range(1,2): r = p.bind() results = r.run() if results.getStatus("records") `= "FAILED": raise "Pig job failed" ------------- Command to Run: $pig -x local embed_pig.py Error I got: Error 1000 - Parsing Error. The purpose of this script to call the pig scripts twice, iteratively. What is the correct way to run a code like this? Any other special environment variables that I need to set? Thanks, Deepak -----Original Message----- From: Olga Natkovich [mailto:[EMAIL PROTECTED]] Sent: Monday, January 10, 2011 11:24 PM To: [EMAIL PROTECTED] Subject: RE: Iterative MapReduce with PIG The initial implementation has been checked into the trunk last Friday. If you feel adventurous, you can give it a try :). Olga -----Original Message----- From: Alan Gates [mailto:[EMAIL PROTECTED]] Sent: Monday, January 10, 2011 8:19 AM To: [EMAIL PROTECTED] Subject: Re: Iterative MapReduce with PIG This is one of our major initiatives for 0.9. See http://wiki.apache.org/pig/TuringCompletePig and https://issues.apache.org/jira/browse/PIG-1479. But until that's ready you'll have to use Java or piglet as recommended by Dmitriy. Alan. On Jan 10, 2011, at 3:09 AM, [EMAIL PROTECTED] wrote: > Hi, > > I need to implement an application that is iterative in nature. At the > end of each iteration, I need to take the result and provide it as an > input for the next iteration. > > Embedding PIG statements in a Java Program looks like one way to do > it. > > But I prefer using Python for programming. How can I do this? > > Thanks, > Deepak Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
-
Re: Iterative MapReduce with PIGRichard Ding 2011-01-11, 18:17
The following script works with the latest 0.9 snapshot:
#!/usr/bin/python # Name - embed_pig.py # need to explicitly import the Pig class from org.apache.pig.scripting import Pig p = Pig.compile(""" records = LOAD 'path/to/data' AS (input_line:chararray); DESCRIBE records; """) for i in range(0,2): r = p.bind() results = r.runSingle() If you just want to use the command DESCRIBE, this script works better: #!/usr/bin/python # Name - embed_pig.py # need to explicitly import the Pig class from org.apache.pig.scripting import Pig p = Pig.compile(""" records = LOAD 'path/to/data' AS (input_line:chararray); """) for i in range(0,2): r = p.bind() r. describe('records') On 1/11/11 1:03 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: Hi, I downloaded Pig-0.9.0-SNAPSHOT.tar.gz and set it up. I am trying to run this: -------- #!/usr/bin/python # Name - embed_pig.py p = Pig.compile(""" records = LOAD 'path/to/data' AS (input_line:chararray); DESCRIBE records; """) for i in range(1,2): r = p.bind() results = r.run() if results.getStatus("records") `= "FAILED": raise "Pig job failed" ------------- Command to Run: $pig -x local embed_pig.py Error I got: Error 1000 - Parsing Error. The purpose of this script to call the pig scripts twice, iteratively. What is the correct way to run a code like this? Any other special environment variables that I need to set? Thanks, Deepak -----Original Message----- From: Olga Natkovich [mailto:[EMAIL PROTECTED]] Sent: Monday, January 10, 2011 11:24 PM To: [EMAIL PROTECTED] Subject: RE: Iterative MapReduce with PIG The initial implementation has been checked into the trunk last Friday. If you feel adventurous, you can give it a try :). Olga -----Original Message----- From: Alan Gates [mailto:[EMAIL PROTECTED]] Sent: Monday, January 10, 2011 8:19 AM To: [EMAIL PROTECTED] Subject: Re: Iterative MapReduce with PIG This is one of our major initiatives for 0.9. See http://wiki.apache.org/pig/TuringCompletePig and https://issues.apache.org/jira/browse/PIG-1479. But until that's ready you'll have to use Java or piglet as recommended by Dmitriy. Alan. On Jan 10, 2011, at 3:09 AM, [EMAIL PROTECTED] wrote: > Hi, > > I need to implement an application that is iterative in nature. At the > end of each iteration, I need to take the result and provide it as an > input for the next iteration. > > Embedding PIG statements in a Java Program looks like one way to do > it. > > But I prefer using Python for programming. How can I do this? > > Thanks, > Deepak Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
-
RE: Iterative MapReduce with PIGdeepak.n85@... 2011-01-12, 10:24
Hi,
I am not able to import Pig. The following is throwing up import errors >>> from org.apache.pig.scripting import Pig Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named apache Any ideas? I checked my classpath, and things look alright. -----Original Message----- From: Richard Ding [mailto:[EMAIL PROTECTED]] Sent: Tuesday, January 11, 2011 11:48 PM To: pig-user-list; Deepak Choudhary N (WT01 - Product Engineering Services) Subject: Re: Iterative MapReduce with PIG The following script works with the latest 0.9 snapshot: #!/usr/bin/python # Name - embed_pig.py # need to explicitly import the Pig class from org.apache.pig.scripting import Pig p = Pig.compile(""" records = LOAD 'path/to/data' AS (input_line:chararray); DESCRIBE records; """) for i in range(0,2): r = p.bind() results = r.runSingle() If you just want to use the command DESCRIBE, this script works better: #!/usr/bin/python # Name - embed_pig.py # need to explicitly import the Pig class from org.apache.pig.scripting import Pig p = Pig.compile(""" records = LOAD 'path/to/data' AS (input_line:chararray); """) for i in range(0,2): r = p.bind() r. describe('records') On 1/11/11 1:03 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: Hi, I downloaded Pig-0.9.0-SNAPSHOT.tar.gz and set it up. I am trying to run this: -------- #!/usr/bin/python # Name - embed_pig.py p = Pig.compile(""" records = LOAD 'path/to/data' AS (input_line:chararray); DESCRIBE records; """) for i in range(1,2): r = p.bind() results = r.run() if results.getStatus("records") `= "FAILED": raise "Pig job failed" ------------- Command to Run: $pig -x local embed_pig.py Error I got: Error 1000 - Parsing Error. The purpose of this script to call the pig scripts twice, iteratively. What is the correct way to run a code like this? Any other special environment variables that I need to set? Thanks, Deepak -----Original Message----- From: Olga Natkovich [mailto:[EMAIL PROTECTED]] Sent: Monday, January 10, 2011 11:24 PM To: [EMAIL PROTECTED] Subject: RE: Iterative MapReduce with PIG The initial implementation has been checked into the trunk last Friday. If you feel adventurous, you can give it a try :). Olga -----Original Message----- From: Alan Gates [mailto:[EMAIL PROTECTED]] Sent: Monday, January 10, 2011 8:19 AM To: [EMAIL PROTECTED] Subject: Re: Iterative MapReduce with PIG This is one of our major initiatives for 0.9. See http://wiki.apache.org/pig/TuringCompletePig and https://issues.apache.org/jira/browse/PIG-1479. But until that's ready you'll have to use Java or piglet as recommended by Dmitriy. Alan. On Jan 10, 2011, at 3:09 AM, [EMAIL PROTECTED] wrote: > Hi, > > I need to implement an application that is iterative in nature. At the > end of each iteration, I need to take the result and provide it as an > input for the next iteration. > > Embedding PIG statements in a Java Program looks like one way to do > it. > > But I prefer using Python for programming. How can I do this? > > Thanks, > Deepak Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
-
Re: Iterative MapReduce with PIG김영우 2011-01-12, 10:57
Hi Deepak,
Did you download pig distribution from Apache Hudson? IIt seems that the snapshot build does not include jython.jar Drop the jython.jar into $PIG_HOME/lib directory and then try it again. Also you can specify classpath in java command line. E.g., java -cp pig.jar:/path/jython.jar --embedded jython bedded_pig.py For me, It works fine when I specify 'Local' mode but in MapReduce mode it does not. I dont know why exactly but I guess it's because I'm using CDH beta3. - Youngwoo 2011/1/12 <[EMAIL PROTECTED]> > Hi, > > I am not able to import Pig. > > The following is throwing up import errors > > >>> from org.apache.pig.scripting import Pig > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > ImportError: No module named apache > > Any ideas? I checked my classpath, and things look alright. > > -----Original Message----- > From: Richard Ding [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, January 11, 2011 11:48 PM > To: pig-user-list; Deepak Choudhary N (WT01 - Product Engineering Services) > Subject: Re: Iterative MapReduce with PIG > > The following script works with the latest 0.9 snapshot: > > #!/usr/bin/python > # Name - embed_pig.py > > # need to explicitly import the Pig class from org.apache.pig.scripting > import Pig > > p = Pig.compile(""" > records = LOAD 'path/to/data' AS (input_line:chararray); > DESCRIBE records; > """) > > for i in range(0,2): > r = p.bind() > results = r.runSingle() > > If you just want to use the command DESCRIBE, this script works better: > > #!/usr/bin/python > # Name - embed_pig.py > > # need to explicitly import the Pig class from org.apache.pig.scripting > import Pig > > p = Pig.compile(""" > records = LOAD 'path/to/data' AS (input_line:chararray); > """) > > for i in range(0,2): > r = p.bind() > r. describe('records') > > On 1/11/11 1:03 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > > Hi, > > I downloaded Pig-0.9.0-SNAPSHOT.tar.gz and set it up. > > I am trying to run this: > > -------- > #!/usr/bin/python > # Name - embed_pig.py > > p = Pig.compile(""" > records = LOAD 'path/to/data' AS (input_line:chararray); > DESCRIBE records; > """) > > for i in range(1,2): > r = p.bind() > results = r.run() > if results.getStatus("records") `= "FAILED": > raise "Pig job failed" > > ------------- > > Command to Run: > $pig -x local embed_pig.py > > Error I got: > Error 1000 - Parsing Error. > > The purpose of this script to call the pig scripts twice, iteratively. > > What is the correct way to run a code like this? Any other special > environment variables that I need to set? > > Thanks, > Deepak > > -----Original Message----- > From: Olga Natkovich [mailto:[EMAIL PROTECTED]] > Sent: Monday, January 10, 2011 11:24 PM > To: [EMAIL PROTECTED] > Subject: RE: Iterative MapReduce with PIG > > The initial implementation has been checked into the trunk last Friday. If > you feel adventurous, you can give it a try :). > > Olga > > -----Original Message----- > From: Alan Gates [mailto:[EMAIL PROTECTED]] > Sent: Monday, January 10, 2011 8:19 AM > To: [EMAIL PROTECTED] > Subject: Re: Iterative MapReduce with PIG > > This is one of our major initiatives for 0.9. See > http://wiki.apache.org/pig/TuringCompletePig > and https://issues.apache.org/jira/browse/PIG-1479. But until that's > ready you'll have to use Java or piglet as recommended by Dmitriy. > > Alan. > > On Jan 10, 2011, at 3:09 AM, [EMAIL PROTECTED] wrote: > > > Hi, > > > > I need to implement an application that is iterative in nature. At the > > end of each iteration, I need to take the result and provide it as an > > input for the next iteration. > > > > Embedding PIG statements in a Java Program looks like one way to do > > it. > > > > But I prefer using Python for programming. How can I do this? > > > > Thanks, > > Deepak > > > > Please do not print this email unless it is absolutely necessary.
-
RE: Iterative MapReduce with PIGdeepak.n85@... 2011-01-12, 11:55
Hi Youngwoo,
Yes, I downloaded Pig Snapshot from Hudson. Is there some other Pig-0.9.0 that comes bundled with Jython.jar? Please point me to it. With the snapshot version, I tried your advice. Putting jython.jar in $PIG_HOME/lib did not help. I'm getting the same error. The Java command doesn't seem to recognize the --embedded option. -----Original Message----- From: 김영우 [mailto:[EMAIL PROTECTED]] Sent: Wednesday, January 12, 2011 4:27 PM To: [EMAIL PROTECTED] Subject: Re: Iterative MapReduce with PIG Hi Deepak, Did you download pig distribution from Apache Hudson? IIt seems that the snapshot build does not include jython.jar Drop the jython.jar into $PIG_HOME/lib directory and then try it again. Also you can specify classpath in java command line. E.g., java -cp pig.jar:/path/jython.jar --embedded jython bedded_pig.py For me, It works fine when I specify 'Local' mode but in MapReduce mode it does not. I dont know why exactly but I guess it's because I'm using CDH beta3. - Youngwoo 2011/1/12 <[EMAIL PROTECTED]> > Hi, > > I am not able to import Pig. > > The following is throwing up import errors > > >>> from org.apache.pig.scripting import Pig > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > ImportError: No module named apache > > Any ideas? I checked my classpath, and things look alright. > > -----Original Message----- > From: Richard Ding [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, January 11, 2011 11:48 PM > To: pig-user-list; Deepak Choudhary N (WT01 - Product Engineering > Services) > Subject: Re: Iterative MapReduce with PIG > > The following script works with the latest 0.9 snapshot: > > #!/usr/bin/python > # Name - embed_pig.py > > # need to explicitly import the Pig class from > org.apache.pig.scripting import Pig > > p = Pig.compile(""" > records = LOAD 'path/to/data' AS (input_line:chararray); > DESCRIBE records; > """) > > for i in range(0,2): > r = p.bind() > results = r.runSingle() > > If you just want to use the command DESCRIBE, this script works better: > > #!/usr/bin/python > # Name - embed_pig.py > > # need to explicitly import the Pig class from > org.apache.pig.scripting import Pig > > p = Pig.compile(""" > records = LOAD 'path/to/data' AS (input_line:chararray); > """) > > for i in range(0,2): > r = p.bind() > r. describe('records') > > On 1/11/11 1:03 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > > Hi, > > I downloaded Pig-0.9.0-SNAPSHOT.tar.gz and set it up. > > I am trying to run this: > > -------- > #!/usr/bin/python > # Name - embed_pig.py > > p = Pig.compile(""" > records = LOAD 'path/to/data' AS (input_line:chararray); > DESCRIBE records; > """) > > for i in range(1,2): > r = p.bind() > results = r.run() > if results.getStatus("records") `= "FAILED": > raise "Pig job failed" > > ------------- > > Command to Run: > $pig -x local embed_pig.py > > Error I got: > Error 1000 - Parsing Error. > > The purpose of this script to call the pig scripts twice, iteratively. > > What is the correct way to run a code like this? Any other special > environment variables that I need to set? > > Thanks, > Deepak > > -----Original Message----- > From: Olga Natkovich [mailto:[EMAIL PROTECTED]] > Sent: Monday, January 10, 2011 11:24 PM > To: [EMAIL PROTECTED] > Subject: RE: Iterative MapReduce with PIG > > The initial implementation has been checked into the trunk last > Friday. If you feel adventurous, you can give it a try :). > > Olga > > -----Original Message----- > From: Alan Gates [mailto:[EMAIL PROTECTED]] > Sent: Monday, January 10, 2011 8:19 AM > To: [EMAIL PROTECTED] > Subject: Re: Iterative MapReduce with PIG > > This is one of our major initiatives for 0.9. See > http://wiki.apache.org/pig/TuringCompletePig > and https://issues.apache.org/jira/browse/PIG-1479. But until that's > ready you'll have to use Java or piglet as recommended by Dmitriy. Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
-
Re: Iterative MapReduce with PIG김영우 2011-01-13, 03:11
Hi Deepak,
I just build the pig snapshot from my PC and then I deploy a distribution to server. Also I drop required jars into $PIG_HOME/lib directory. After all, Seems it works fine. Hopes this helps. - Youngwoo *My env for Hadoop:* $ env | grep HADOOP HADOOP_HOME=/usr/lib/hadoop-0.20 *My pig script for testing:* $ cat test_embedded.py #!/usr/bin/python > # need to explicitly import the Pig class from org.apache.pig.scripting import Pig > output = 'outfile' > p = Pig.compile(""" records = LOAD '/user/hanadmin/DUAL.TXT' USING PigStorage() AS > (input_line:chararray); r1 = FOREACH records GENERATE LOWER(records.input_line); STORE r1 INTO '$out'; """) for i in range(0, 2): print 'Iteration: ' + str(i) q = p.bind({'out' : output + str(i)}) r = q.runSingle() *Run the script:* $ bin/pig test_embedded.py 2011-01-13 11:32:02,502 [main] INFO org.apache.pig.Main - Logging error messages to: /hanmail/pig-0.9.0-SNAPSHOT/pig_1294885922500.log 2011-01-13 11:32:02,516 [main] INFO org.apache.pig.Main - Run embedded script: jython 2011-01-13 11:32:02,745 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoopdev:8020 2011-01-13 11:32:03,056 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: hadoopdev:8021 Iteration: 0 2011-01-13 11:32:04,586 [main] INFO org.apache.pig.scripting.BoundScript - Query to run: records = LOAD '/user/hanadmin/DUAL.TXT' USING PigStorage() AS (input_line:chararray); r1 = FOREACH records GENERATE LOWER(records.input_line); STORE r1 INTO 'outfile0'; 2011-01-13 11:32:04,872 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2011-01-13 11:32:04,873 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used. 2011-01-13 11:32:05,096 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: records: Store(hdfs://hadoopdev/tmp/temp644267750/tmp-1639994869:org.apache.pig.impl.io.InterStorage) - scope-10 Operator Key: scope-10) 2011-01-13 11:32:05,097 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: r1: Store(hdfs://hadoopdev/user/hanadmin/outfile0:org.apache.pig.builtin.PigStorage) - scope-16 Operator Key: scope-16) 2011-01-13 11:32:05,113 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2011-01-13 11:32:05,153 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2011-01-13 11:32:05,177 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2011-01-13 11:32:05,177 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - number of input files: 1 2011-01-13 11:32:05,177 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - number of input files: 0 2011-01-13 11:32:05,203 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3 2011-01-13 11:32:05,204 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees. 2011-01-13 11:32:05,204 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 out of total 3 MR operators. 2011-01-13 11:32:05,204 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2011-01-13 11:32:05,284 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2011-01-13 11:32:05,306 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2011-01-13 11:32:09,859 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up multi store job 2011-01-13 11:32:09,916 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2011-01-13 11:32:10,421 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2011-01-13 11:32:10,602 [Thread-4] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2011-01-13 11:32:10,605 [Thread-4] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2011-01-13 11:32:11,533 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201101121634_0007 2011-01-13 11:32:11,533 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://hadoopdev:50030/jobdetails.jsp?jobid=job_201101121634_0007 2011-01-13 11:32:25,704 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 25% complete 2011-01-13 11:32:28,739 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2011-01-13 11:32:31,337 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2011-01-13 11:32:31,339 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2011-01-13 11:32:36,097 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2011-01-13 11:32:36,110 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapRedu
-
RE: Iterative MapReduce with PIGdeepak.n85@... 2011-01-13, 06:24
Hi Youngwoo,
I appreciate your help. It worked! I re-installed everything freshly, and it started working. Dunno what was going wrong earlier. This feature is amazing. Regards, Deepak ________________________________ From: 김영우 [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 13, 2011 8:41 AM To: Deepak Choudhary N (WT01 - Product Engineering Services) Cc: [EMAIL PROTECTED] Subject: Re: Iterative MapReduce with PIG Hi Deepak, I just build the pig snapshot from my PC and then I deploy a distribution to server. Also I drop required jars into $PIG_HOME/lib directory. After all, Seems it works fine. Hopes this helps. - Youngwoo My env for Hadoop: $ env | grep HADOOP HADOOP_HOME=/usr/lib/hadoop-0.20 My pig script for testing: $ cat test_embedded.py #!/usr/bin/python # need to explicitly import the Pig class from org.apache.pig.scripting import Pig output = 'outfile' p = Pig.compile(""" records = LOAD '/user/hanadmin/DUAL.TXT' USING PigStorage() AS (input_line:chararray); r1 = FOREACH records GENERATE LOWER(records.input_line); STORE r1 INTO '$out'; """) for i in range(0, 2): print 'Iteration: ' + str(i) q = p.bind({'out' : output + str(i)}) r = q.runSingle() Run the script: $ bin/pig test_embedded.py 2011-01-13 11:32:02,502 [main] INFO org.apache.pig.Main - Logging error messages to: /hanmail/pig-0.9.0-SNAPSHOT/pig_1294885922500.log 2011-01-13 11:32:02,516 [main] INFO org.apache.pig.Main - Run embedded script: jython 2011-01-13 11:32:02,745 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoopdev:8020 2011-01-13 11:32:03,056 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: hadoopdev:8021 Iteration: 0 2011-01-13 11:32:04,586 [main] INFO org.apache.pig.scripting.BoundScript - Query to run: records = LOAD '/user/hanadmin/DUAL.TXT' USING PigStorage() AS (input_line:chararray); r1 = FOREACH records GENERATE LOWER(records.input_line); STORE r1 INTO 'outfile0'; 2011-01-13 11:32:04,872 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2011-01-13 11:32:04,873 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used. 2011-01-13 11:32:05,096 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: records: Store(hdfs://hadoopdev/tmp/temp644267750/tmp-1639994869:org.apache.pig.impl.io.InterStorage) - scope-10 Operator Key: scope-10) 2011-01-13 11:32:05,097 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: r1: Store(hdfs://hadoopdev/user/hanadmin/outfile0:org.apache.pig.builtin.PigStorage) - scope-16 Operator Key: scope-16) 2011-01-13 11:32:05,113 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2011-01-13 11:32:05,153 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2011-01-13 11:32:05,177 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2011-01-13 11:32:05,177 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - number of input files: 1 2011-01-13 11:32:05,177 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - number of input files: 0 2011-01-13 11:32:05,203 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3 2011-01-13 11:32:05,204 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees. 2011-01-13 11:32:05,204 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 out of total 3 MR operators. 2011-01-13 11:32:05,204 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2011-01-13 11:32:05,284 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2011-01-13 11:32:05,306 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2011-01-13 11:32:09,859 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up multi store job 2011-01-13 11:32:09,916 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2011-01-13 11:32:10,421 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2011-01-13 11:32:10,602 [Thread-4] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2011-01-13 11:32:10,605 [Thread-4] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2011-01-13 11:32:11,533 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201101121634_0007 2011-01-13 11:32:11,533 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://hadoopdev:50030/jobdetails.jsp?jobid=job_201101121634_0007 2011-01-13 11:32:25,704 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 25% complete 2011-01-13 11:32:28,739 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2011-01-13 11:32:31,337 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2011-01-13 11:32:31,339 [main] I |