Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Load Pig metadata from file?


Copy link to this message
-
Re: Load Pig metadata from file?
Ruslan Al-Fakikh 2012-05-18, 17:08
Saurabh,

We had the same requirement in our project and what we did is
implementing our custom Loader which takes an xml file containing all
the schema information. Like this:
data = LOAD 'path' USING com.example.CustomLoader('schema.xml');
But this is not a trivial solution, because you'll have to deal with Pig API

Ruslan

On Thu, May 17, 2012 at 12:25 AM, Thejas Nair <[EMAIL PROTECTED]> wrote:
> you can also use 'pig -dryrun ..' to see what the pig query after parameter
> substitution looks like.
>
> Thanks,
> Thejas
>
>
>
> On 5/15/12 4:56 PM, Saurabh S wrote:
>>
>>
>> Aniket: You were spot on. This method doesn't allow any spaces in the file
>> because the parameter will get truncated at the first sighting of a white
>> space. I found that using the 'bash -x' method that you suggested. Thanks a
>> lot for that!
>>
>> Shan: I'm just beginning to use Pig and don't know a lot about macros.
>> I'll look into them, however.
>>
>> Regards,
>> Saurabh
>>
>>> Date: Tue, 15 May 2012 15:58:53 -0700
>>> Subject: Re: Load Pig metadata from file?
>>> From: [EMAIL PROTECTED]
>>> To: [EMAIL PROTECTED]
>>>
>>> I think you need to play with some quotes, its more likely a bash
>>> problem.
>>>
>>> one way to debug is bash -x pig  -f script.pig -param md=$(cat
>>> metadata.dat) and check what does hadoop jar gets in the end.
>>>
>>> try - md="$(cat metadata.dat)"
>>> or -md="'$(cat metadata.dat)'" (single quote inside double quote
>>> and so on..
>>>
>>> Thanks,
>>> Aniket
>>>
>>> On Tue, May 15, 2012 at 3:34 PM, Saurabh S<[EMAIL PROTECTED]>  wrote:
>>>
>>>>
>>>> Here is a sample LOAD statement from Programming Pig book:
>>>>
>>>> daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
>>>>            date:chararray, open:float, high:float, low:float,
>>>> close:float,
>>>>            volume:int, adj_close:float);
>>>>
>>>> In my case, there are around 250 columns to load. So, I created a file,
>>>> say, metadata.dat with its contents as follows:
>>>>
>>>>  (exchange:chararray, symbol:chararray,
>>>>
>>>>            date:chararray, open:float, high:float, low:float,
>>>> close:float,
>>>>
>>>>            volume:int, adj_close:float)
>>>>
>>>> My load statement now looks like
>>>>
>>>> daily = load 'NYSE_daily' as $md;
>>>>
>>>> and the execution looks like.
>>>>
>>>> pig -f script.pig -param md=$(cat metadata.dat)
>>>>
>>>> However, I get the following error in this method:
>>>>
>>>> ERROR 1000: Error during parsing. Lexical error at line 9, column 0.
>>>>  Encountered:<EOF>  after : ""
>>>>
>>>> Copying the contents of the file in appropriate place works fine. But
>>>> the
>>>> pig script is cluttered with the metdata and I would like to separate it
>>>> from the script. Any ideas?
>>>>
>>>> HCatLoader() does not seem to be available on my system.
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>> --
>>> "...:::Aniket:::... Quetzalco@tl"
>>
>>
>
>

--
Best Regards,
Ruslan Al-Fakikh