|
|
-
Pig job result output and schema
Jeff Yuan 2013-03-05, 19:18
I have a couple of questions regarding job result and schema. The context is that I'm trying to create a custom entry point for Pig that takes a script, executes it, and always stores the last declared alias/variable in a file. Would appreciate any insights to the 2 questions I have below or any advice in general.
1. I'm looking to automatically dump or store the last variable/alias that the user has set. I know PigServer.getAliasKeySet or getAliases will return a Set or Map of the alias. But they are unordered, is there a way to get an ordered list of aliases?
2. I'm interested in getting the result schema and the raw result set. Is the best way to do this just PigServer.dumpSchema(alias) to get the result schema, and PigServer.openIterator(alias) to get the resulting Tuples?
Thanks, Jeff
-
Re: Pig job result output and schema
Johnny Zhang 2013-03-05, 19:30
Hi, Jeff: Reply inline. On Tue, Mar 5, 2013 at 11:18 AM, Jeff Yuan <[EMAIL PROTECTED]> wrote:
> I have a couple of questions regarding job result and schema. The > context is that I'm trying to create a custom entry point for Pig that > takes a script, executes it, and always stores the last declared > alias/variable in a file. Would appreciate any insights to the 2 > questions I have below or any advice in general. > > 1. I'm looking to automatically dump or store the last variable/alias > that the user has set. I know PigServer.getAliasKeySet or getAliases > will return a Set or Map of the alias. But they are unordered, is > there a way to get an ordered list of aliases? > Have you try PigServer.getPigContext().getLastAlias()) ?
> > 2. I'm interested in getting the result schema and the raw result set. > Is the best way to do this just PigServer.dumpSchema(alias) to get the > result schema, and PigServer.openIterator(alias) to get the resulting > Tuples? > yes, as I know, this is a good way to do it. after you get iterator, you can use below to go through each tuple while(iter.hasNext()) { Tuple t = iter.next(); }
> > Thanks, > Jeff >
Johnny
-
Re: Pig job result output and schema
Jeff Yuan 2013-03-05, 20:01
Thanks for your suggestions, they work very well. One follow up question:
Is there a way to dynamically strip STORE and DUMP commands from a loaded in script? So everything works well if I pass in a script without any dump or store keywords. But when there is a dump, I get an error such as "Syntax error, unexpected symbol at or near 'dump'".
I'm calling: Syntax error, unexpected symbol at or near 'dump
pig.setBatchOn(); pig.registerQuery(req.query); pig.dumpSchema(pig.getPigContext().getLastAlias()); Iterator<Tuple> iter = pig.openIterator(pig.getPigContext().getLastAlias()); ...
Thanks, Jeff
On Tue, Mar 5, 2013 at 11:30 AM, Johnny Zhang <[EMAIL PROTECTED]> wrote: > Hi, Jeff: > Reply inline. > > > On Tue, Mar 5, 2013 at 11:18 AM, Jeff Yuan <[EMAIL PROTECTED]> wrote: > >> I have a couple of questions regarding job result and schema. The >> context is that I'm trying to create a custom entry point for Pig that >> takes a script, executes it, and always stores the last declared >> alias/variable in a file. Would appreciate any insights to the 2 >> questions I have below or any advice in general. >> >> 1. I'm looking to automatically dump or store the last variable/alias >> that the user has set. I know PigServer.getAliasKeySet or getAliases >> will return a Set or Map of the alias. But they are unordered, is >> there a way to get an ordered list of aliases? >> > Have you try PigServer.getPigContext().getLastAlias()) ? > >> >> 2. I'm interested in getting the result schema and the raw result set. >> Is the best way to do this just PigServer.dumpSchema(alias) to get the >> result schema, and PigServer.openIterator(alias) to get the resulting >> Tuples? >> > yes, as I know, this is a good way to do it. after you get iterator, you > can use below to go through each tuple > while(iter.hasNext()) { > Tuple t = iter.next(); > } > >> >> Thanks, >> Jeff >> > > Johnny
-
Re: Pig job result output and schema
Jonathan Coveney 2013-03-05, 22:03
if you use the alias "@", it should properly dump etc the last alias. If not file a JIRA. 2013/3/5 Jeff Yuan <[EMAIL PROTECTED]>
> Thanks for your suggestions, they work very well. One follow up question: > > Is there a way to dynamically strip STORE and DUMP commands from a > loaded in script? So everything works well if I pass in a script > without any dump or store keywords. But when there is a dump, I get an > error such as "Syntax error, unexpected symbol at or near 'dump'". > > I'm calling: > Syntax error, unexpected symbol at or near 'dump > > pig.setBatchOn(); > pig.registerQuery(req.query); > pig.dumpSchema(pig.getPigContext().getLastAlias()); > Iterator<Tuple> iter > pig.openIterator(pig.getPigContext().getLastAlias()); > ... > > Thanks, > Jeff > > On Tue, Mar 5, 2013 at 11:30 AM, Johnny Zhang <[EMAIL PROTECTED]> > wrote: > > Hi, Jeff: > > Reply inline. > > > > > > On Tue, Mar 5, 2013 at 11:18 AM, Jeff Yuan <[EMAIL PROTECTED]> > wrote: > > > >> I have a couple of questions regarding job result and schema. The > >> context is that I'm trying to create a custom entry point for Pig that > >> takes a script, executes it, and always stores the last declared > >> alias/variable in a file. Would appreciate any insights to the 2 > >> questions I have below or any advice in general. > >> > >> 1. I'm looking to automatically dump or store the last variable/alias > >> that the user has set. I know PigServer.getAliasKeySet or getAliases > >> will return a Set or Map of the alias. But they are unordered, is > >> there a way to get an ordered list of aliases? > >> > > Have you try PigServer.getPigContext().getLastAlias()) ? > > > >> > >> 2. I'm interested in getting the result schema and the raw result set. > >> Is the best way to do this just PigServer.dumpSchema(alias) to get the > >> result schema, and PigServer.openIterator(alias) to get the resulting > >> Tuples? > >> > > yes, as I know, this is a good way to do it. after you get iterator, you > > can use below to go through each tuple > > while(iter.hasNext()) { > > Tuple t = iter.next(); > > } > > > >> > >> Thanks, > >> Jeff > >> > > > > Johnny >
-
Re: Pig job result output and schema
Johnny Zhang 2013-03-05, 22:08
Hi, Jeff: It seems happened to me too that I cannot use dump command in pig.registerQuery(req.query).
Johnny On Tue, Mar 5, 2013 at 12:01 PM, Jeff Yuan <[EMAIL PROTECTED]> wrote:
> Thanks for your suggestions, they work very well. One follow up question: > > Is there a way to dynamically strip STORE and DUMP commands from a > loaded in script? So everything works well if I pass in a script > without any dump or store keywords. But when there is a dump, I get an > error such as "Syntax error, unexpected symbol at or near 'dump'". > > I'm calling: > Syntax error, unexpected symbol at or near 'dump > > pig.setBatchOn(); > pig.registerQuery(req.query); > pig.dumpSchema(pig.getPigContext().getLastAlias()); > Iterator<Tuple> iter > pig.openIterator(pig.getPigContext().getLastAlias()); > ... > > Thanks, > Jeff > > On Tue, Mar 5, 2013 at 11:30 AM, Johnny Zhang <[EMAIL PROTECTED]> > wrote: > > Hi, Jeff: > > Reply inline. > > > > > > On Tue, Mar 5, 2013 at 11:18 AM, Jeff Yuan <[EMAIL PROTECTED]> > wrote: > > > >> I have a couple of questions regarding job result and schema. The > >> context is that I'm trying to create a custom entry point for Pig that > >> takes a script, executes it, and always stores the last declared > >> alias/variable in a file. Would appreciate any insights to the 2 > >> questions I have below or any advice in general. > >> > >> 1. I'm looking to automatically dump or store the last variable/alias > >> that the user has set. I know PigServer.getAliasKeySet or getAliases > >> will return a Set or Map of the alias. But they are unordered, is > >> there a way to get an ordered list of aliases? > >> > > Have you try PigServer.getPigContext().getLastAlias()) ? > > > >> > >> 2. I'm interested in getting the result schema and the raw result set. > >> Is the best way to do this just PigServer.dumpSchema(alias) to get the > >> result schema, and PigServer.openIterator(alias) to get the resulting > >> Tuples? > >> > > yes, as I know, this is a good way to do it. after you get iterator, you > > can use below to go through each tuple > > while(iter.hasNext()) { > > Tuple t = iter.next(); > > } > > > >> > >> Thanks, > >> Jeff > >> > > > > Johnny >
|
|