|
|
-
Load Pig metadata from file?
Saurabh S 2012-05-15, 22:34
Here is a sample LOAD statement from Programming Pig book:
daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray, date:chararray, open:float, high:float, low:float, close:float, volume:int, adj_close:float);
In my case, there are around 250 columns to load. So, I created a file, say, metadata.dat with its contents as follows:
(exchange:chararray, symbol:chararray,
date:chararray, open:float, high:float, low:float, close:float,
volume:int, adj_close:float)
My load statement now looks like
daily = load 'NYSE_daily' as $md;
and the execution looks like.
pig -f script.pig -param md=$(cat metadata.dat)
However, I get the following error in this method:
ERROR 1000: Error during parsing. Lexical error at line 9, column 0. Encountered: <EOF> after : ""
Copying the contents of the file in appropriate place works fine. But the pig script is cluttered with the metdata and I would like to separate it from the script. Any ideas?
HCatLoader() does not seem to be available on my system.
-
Re: Load Pig metadata from file?
Aniket Mokashi 2012-05-15, 22:58
I think you need to play with some quotes, its more likely a bash problem.
one way to debug is bash -x pig -f script.pig -param md=$(cat metadata.dat) and check what does hadoop jar gets in the end.
try - md="$(cat metadata.dat)" or -md="'$(cat metadata.dat)'" (single quote inside double quote and so on..
Thanks, Aniket
On Tue, May 15, 2012 at 3:34 PM, Saurabh S <[EMAIL PROTECTED]> wrote:
> > Here is a sample LOAD statement from Programming Pig book: > > daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray, > date:chararray, open:float, high:float, low:float, close:float, > volume:int, adj_close:float); > > In my case, there are around 250 columns to load. So, I created a file, > say, metadata.dat with its contents as follows: > > (exchange:chararray, symbol:chararray, > > date:chararray, open:float, high:float, low:float, close:float, > > volume:int, adj_close:float) > > My load statement now looks like > > daily = load 'NYSE_daily' as $md; > > and the execution looks like. > > pig -f script.pig -param md=$(cat metadata.dat) > > However, I get the following error in this method: > > ERROR 1000: Error during parsing. Lexical error at line 9, column 0. > Encountered: <EOF> after : "" > > Copying the contents of the file in appropriate place works fine. But the > pig script is cluttered with the metdata and I would like to separate it > from the script. Any ideas? > > HCatLoader() does not seem to be available on my system. > > > > -- "...:::Aniket:::... Quetzalco@tl"
-
Re: Load Pig metadata from file?
shan s 2012-05-15, 23:11
Can you use macros instead? It would be much cleaner.. I was just pointed to http://hortonworks.com/blog/new-apache-pig-features-part-1-macro/On Wed, May 16, 2012 at 4:04 AM, Saurabh S <[EMAIL PROTECTED]> wrote: > > Here is a sample LOAD statement from Programming Pig book: > > daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray, > date:chararray, open:float, high:float, low:float, close:float, > volume:int, adj_close:float); > > In my case, there are around 250 columns to load. So, I created a file, > say, metadata.dat with its contents as follows: > > (exchange:chararray, symbol:chararray, > > date:chararray, open:float, high:float, low:float, close:float, > > volume:int, adj_close:float) > > My load statement now looks like > > daily = load 'NYSE_daily' as $md; > > and the execution looks like. > > pig -f script.pig -param md=$(cat metadata.dat) > > However, I get the following error in this method: > > ERROR 1000: Error during parsing. Lexical error at line 9, column 0. > Encountered: <EOF> after : "" > > Copying the contents of the file in appropriate place works fine. But the > pig script is cluttered with the metdata and I would like to separate it > from the script. Any ideas? > > HCatLoader() does not seem to be available on my system. > > > >
-
RE: Load Pig metadata from file?
Saurabh S 2012-05-15, 23:56
Aniket: You were spot on. This method doesn't allow any spaces in the file because the parameter will get truncated at the first sighting of a white space. I found that using the 'bash -x' method that you suggested. Thanks a lot for that!
Shan: I'm just beginning to use Pig and don't know a lot about macros. I'll look into them, however.
Regards, Saurabh
> Date: Tue, 15 May 2012 15:58:53 -0700 > Subject: Re: Load Pig metadata from file? > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > I think you need to play with some quotes, its more likely a bash problem. > > one way to debug is bash -x pig -f script.pig -param md=$(cat > metadata.dat) and check what does hadoop jar gets in the end. > > try - md="$(cat metadata.dat)" > or -md="'$(cat metadata.dat)'" (single quote inside double quote > and so on.. > > Thanks, > Aniket > > On Tue, May 15, 2012 at 3:34 PM, Saurabh S <[EMAIL PROTECTED]> wrote: > > > > > Here is a sample LOAD statement from Programming Pig book: > > > > daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray, > > date:chararray, open:float, high:float, low:float, close:float, > > volume:int, adj_close:float); > > > > In my case, there are around 250 columns to load. So, I created a file, > > say, metadata.dat with its contents as follows: > > > > (exchange:chararray, symbol:chararray, > > > > date:chararray, open:float, high:float, low:float, close:float, > > > > volume:int, adj_close:float) > > > > My load statement now looks like > > > > daily = load 'NYSE_daily' as $md; > > > > and the execution looks like. > > > > pig -f script.pig -param md=$(cat metadata.dat) > > > > However, I get the following error in this method: > > > > ERROR 1000: Error during parsing. Lexical error at line 9, column 0. > > Encountered: <EOF> after : "" > > > > Copying the contents of the file in appropriate place works fine. But the > > pig script is cluttered with the metdata and I would like to separate it > > from the script. Any ideas? > > > > HCatLoader() does not seem to be available on my system. > > > > > > > > > > > > > -- > "...:::Aniket:::... Quetzalco@tl"
-
Re: Load Pig metadata from file?
Thejas Nair 2012-05-16, 20:25
you can also use 'pig -dryrun ..' to see what the pig query after parameter substitution looks like.
Thanks, Thejas On 5/15/12 4:56 PM, Saurabh S wrote: > > Aniket: You were spot on. This method doesn't allow any spaces in the file because the parameter will get truncated at the first sighting of a white space. I found that using the 'bash -x' method that you suggested. Thanks a lot for that! > > Shan: I'm just beginning to use Pig and don't know a lot about macros. I'll look into them, however. > > Regards, > Saurabh > >> Date: Tue, 15 May 2012 15:58:53 -0700 >> Subject: Re: Load Pig metadata from file? >> From: [EMAIL PROTECTED] >> To: [EMAIL PROTECTED] >> >> I think you need to play with some quotes, its more likely a bash problem. >> >> one way to debug is bash -x pig -f script.pig -param md=$(cat >> metadata.dat) and check what does hadoop jar gets in the end. >> >> try - md="$(cat metadata.dat)" >> or -md="'$(cat metadata.dat)'" (single quote inside double quote >> and so on.. >> >> Thanks, >> Aniket >> >> On Tue, May 15, 2012 at 3:34 PM, Saurabh S<[EMAIL PROTECTED]> wrote: >> >>> >>> Here is a sample LOAD statement from Programming Pig book: >>> >>> daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray, >>> date:chararray, open:float, high:float, low:float, close:float, >>> volume:int, adj_close:float); >>> >>> In my case, there are around 250 columns to load. So, I created a file, >>> say, metadata.dat with its contents as follows: >>> >>> (exchange:chararray, symbol:chararray, >>> >>> date:chararray, open:float, high:float, low:float, close:float, >>> >>> volume:int, adj_close:float) >>> >>> My load statement now looks like >>> >>> daily = load 'NYSE_daily' as $md; >>> >>> and the execution looks like. >>> >>> pig -f script.pig -param md=$(cat metadata.dat) >>> >>> However, I get the following error in this method: >>> >>> ERROR 1000: Error during parsing. Lexical error at line 9, column 0. >>> Encountered:<EOF> after : "" >>> >>> Copying the contents of the file in appropriate place works fine. But the >>> pig script is cluttered with the metdata and I would like to separate it >>> from the script. Any ideas? >>> >>> HCatLoader() does not seem to be available on my system. >>> >>> >>> >>> >> >> >> >> >> -- >> "...:::Aniket:::... Quetzalco@tl" >
-
Re: Load Pig metadata from file?
Ruslan Al-Fakikh 2012-05-18, 17:08
Saurabh,
We had the same requirement in our project and what we did is implementing our custom Loader which takes an xml file containing all the schema information. Like this: data = LOAD 'path' USING com.example.CustomLoader('schema.xml'); But this is not a trivial solution, because you'll have to deal with Pig API
Ruslan
On Thu, May 17, 2012 at 12:25 AM, Thejas Nair <[EMAIL PROTECTED]> wrote: > you can also use 'pig -dryrun ..' to see what the pig query after parameter > substitution looks like. > > Thanks, > Thejas > > > > On 5/15/12 4:56 PM, Saurabh S wrote: >> >> >> Aniket: You were spot on. This method doesn't allow any spaces in the file >> because the parameter will get truncated at the first sighting of a white >> space. I found that using the 'bash -x' method that you suggested. Thanks a >> lot for that! >> >> Shan: I'm just beginning to use Pig and don't know a lot about macros. >> I'll look into them, however. >> >> Regards, >> Saurabh >> >>> Date: Tue, 15 May 2012 15:58:53 -0700 >>> Subject: Re: Load Pig metadata from file? >>> From: [EMAIL PROTECTED] >>> To: [EMAIL PROTECTED] >>> >>> I think you need to play with some quotes, its more likely a bash >>> problem. >>> >>> one way to debug is bash -x pig -f script.pig -param md=$(cat >>> metadata.dat) and check what does hadoop jar gets in the end. >>> >>> try - md="$(cat metadata.dat)" >>> or -md="'$(cat metadata.dat)'" (single quote inside double quote >>> and so on.. >>> >>> Thanks, >>> Aniket >>> >>> On Tue, May 15, 2012 at 3:34 PM, Saurabh S<[EMAIL PROTECTED]> wrote: >>> >>>> >>>> Here is a sample LOAD statement from Programming Pig book: >>>> >>>> daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray, >>>> date:chararray, open:float, high:float, low:float, >>>> close:float, >>>> volume:int, adj_close:float); >>>> >>>> In my case, there are around 250 columns to load. So, I created a file, >>>> say, metadata.dat with its contents as follows: >>>> >>>> (exchange:chararray, symbol:chararray, >>>> >>>> date:chararray, open:float, high:float, low:float, >>>> close:float, >>>> >>>> volume:int, adj_close:float) >>>> >>>> My load statement now looks like >>>> >>>> daily = load 'NYSE_daily' as $md; >>>> >>>> and the execution looks like. >>>> >>>> pig -f script.pig -param md=$(cat metadata.dat) >>>> >>>> However, I get the following error in this method: >>>> >>>> ERROR 1000: Error during parsing. Lexical error at line 9, column 0. >>>> Encountered:<EOF> after : "" >>>> >>>> Copying the contents of the file in appropriate place works fine. But >>>> the >>>> pig script is cluttered with the metdata and I would like to separate it >>>> from the script. Any ideas? >>>> >>>> HCatLoader() does not seem to be available on my system. >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> -- >>> "...:::Aniket:::... Quetzalco@tl" >> >> > >
-- Best Regards, Ruslan Al-Fakikh
|
|