We had a yml file that mapped physical datasources to the loader that the
generic one serves as a facade to. Now we're moving to an HCatalog based
solution that handles that as well as the logical to physical resolution.
Basically the mappings are stored in a DB.
On Tue, Dec 11, 2012 at 8:20 AM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:
> Thanks Bill. Any ideas on how to hide the location of HDFS files from the
> end user?
> On Tue, Dec 11, 2012 at 9:42 PM, Bill Graham <[EMAIL PROTECTED]> wrote:
>> I think the latter would be better. Since the LoadFunc would be decoupled
>> from the data exporter you could schedule the exporting independent of the
>> loading. We do something similar, without the $query part.
>> On Tue, Dec 11, 2012 at 1:10 AM, Prashant Kommireddi <[EMAIL PROTECTED]
>> > I was working on a LoadFunc and needed some ideas/second opinion on the
>> > best way to do this:
>> > 1. We use an API to download data from database as flat-files.
>> > - A query is given with table name and fields required to extract
>> > data
>> > 2. Once 1. is done upload data to HDFS
>> > 3. Upload the schema file to HDFS
>> > 4. LoadFunc to read the schema file and parse data
>> > A strict requirement is to hide the details of the location of these
>> > files from the user issuing the pig query. For a user it could look as
>> > simple as:
>> > A = load 'scheme://SampleTable' using CustomLoader('$query');
>> > User here only issues the load statement on table with a query and API
>> > calls for importing from database could happen in the background.
>> > What would be the best way to do this? Is it better to do the above as
>> > of LoadFunc, or would it rather be beneficial to do it separate and
>> > communicate the location from API import to LoadFunc?
>> > Thanks,
>> > Prashant
>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>> [EMAIL PROTECTED] going forward.*
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*