Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Load Pig metadata from file?


Copy link to this message
-
Re: Load Pig metadata from file?
Saurabh,

We had the same requirement in our project and what we did is
implementing our custom Loader which takes an xml file containing all
the schema information. Like this:
data = LOAD 'path' USING com.example.CustomLoader('schema.xml');
But this is not a trivial solution, because you'll have to deal with Pig API

Ruslan

On Thu, May 17, 2012 at 12:25 AM, Thejas Nair <[EMAIL PROTECTED]> wrote:
> you can also use 'pig -dryrun ..' to see what the pig query after parameter
> substitution looks like.
>
> Thanks,
> Thejas
>
>
>
> On 5/15/12 4:56 PM, Saurabh S wrote:
>>
>>
>> Aniket: You were spot on. This method doesn't allow any spaces in the file
>> because the parameter will get truncated at the first sighting of a white
>> space. I found that using the 'bash -x' method that you suggested. Thanks a
>> lot for that!
>>
>> Shan: I'm just beginning to use Pig and don't know a lot about macros.
>> I'll look into them, however.
>>
>> Regards,
>> Saurabh
>>
>>> Date: Tue, 15 May 2012 15:58:53 -0700
>>> Subject: Re: Load Pig metadata from file?
>>> From: [EMAIL PROTECTED]
>>> To: [EMAIL PROTECTED]
>>>
>>> I think you need to play with some quotes, its more likely a bash
>>> problem.
>>>
>>> one way to debug is bash -x pig  -f script.pig -param md=$(cat
>>> metadata.dat) and check what does hadoop jar gets in the end.
>>>
>>> try - md="$(cat metadata.dat)"
>>> or -md="'$(cat metadata.dat)'" (single quote inside double quote
>>> and so on..
>>>
>>> Thanks,
>>> Aniket
>>>
>>> On Tue, May 15, 2012 at 3:34 PM, Saurabh S<[EMAIL PROTECTED]>  wrote:
>>>
>>>>
>>>> Here is a sample LOAD statement from Programming Pig book:
>>>>
>>>> daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
>>>>            date:chararray, open:float, high:float, low:float,
>>>> close:float,
>>>>            volume:int, adj_close:float);
>>>>
>>>> In my case, there are around 250 columns to load. So, I created a file,
>>>> say, metadata.dat with its contents as follows:
>>>>
>>>>  (exchange:chararray, symbol:chararray,
>>>>
>>>>            date:chararray, open:float, high:float, low:float,
>>>> close:float,
>>>>
>>>>            volume:int, adj_close:float)
>>>>
>>>> My load statement now looks like
>>>>
>>>> daily = load 'NYSE_daily' as $md;
>>>>
>>>> and the execution looks like.
>>>>
>>>> pig -f script.pig -param md=$(cat metadata.dat)
>>>>
>>>> However, I get the following error in this method:
>>>>
>>>> ERROR 1000: Error during parsing. Lexical error at line 9, column 0.
>>>>  Encountered:<EOF>  after : ""
>>>>
>>>> Copying the contents of the file in appropriate place works fine. But
>>>> the
>>>> pig script is cluttered with the metdata and I would like to separate it
>>>> from the script. Any ideas?
>>>>
>>>> HCatLoader() does not seem to be available on my system.
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>> --
>>> "...:::Aniket:::... Quetzalco@tl"
>>
>>
>
>

--
Best Regards,
Ruslan Al-Fakikh