-Re: Input and output path
Ruslan Al-Fakikh 2012-09-11, 15:12
I am suggesting setting up a whole Hive warehouse. This way your
folders will look like
All the partitions' metadata will be kept in a RDBMS, so when you
query them with Hive it will look like
select * from yourdataset where date = 2012-09-11
and it will be fast
HCatalog is a layer that provides this Hive's functionality to Pig and
MapReduce, so in Pig you can FILTER by those dates.
On Tue, Sep 11, 2012 at 3:29 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
> On Mon, Sep 10, 2012 at 4:17 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>wrote:
>> I guess you could use parameters substitution here
>> thanks this works.
>> Also, a note about your architecture:
> Are you suggesting change to the path names or your suggestion is to use
> HCatalog with pig?
>> You can consider using Hive partitions to effectively select
>> appropriate dates in the folder names. But as your tool is Pig, not
>> Hive, you can use HCatalog as a layer
>> Best Regards
>> On Tue, Sep 11, 2012 at 3:11 AM, Mohit Anchlia <[EMAIL PROTECTED]>
>> > Our input path is something like YYYY/MM/DD/HH/input and we like to write
>> > to YYYY/MM/DD/HH/output . Is it possible to get the input path as a
>> > and convert it to YYYY/MM/DD/HH/output that I can use in "store into"
>> > clause?