|
|
+
Mohit Anchlia 2012-09-10, 23:11
+
Ruslan Al-Fakikh 2012-09-10, 23:17
+
Mohit Anchlia 2012-09-10, 23:29
-
Re: Input and output pathRuslan Al-Fakikh 2012-09-11, 15:12
Mohit,
I am suggesting setting up a whole Hive warehouse. This way your folders will look like /user/hive/warehouse/yourdataset/date=2012-09-11 /user/hive/warehouse/yourdataset/date=2012-09-12 ... All the partitions' metadata will be kept in a RDBMS, so when you query them with Hive it will look like select * from yourdataset where date = 2012-09-11 and it will be fast HCatalog is a layer that provides this Hive's functionality to Pig and MapReduce, so in Pig you can FILTER by those dates. http://incubator.apache.org/hcatalog/docs/r0.4.0/loadstore.html#Load+Examples Best Regards On Tue, Sep 11, 2012 at 3:29 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > On Mon, Sep 10, 2012 at 4:17 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>wrote: > >> Mohit, >> >> I guess you could use parameters substitution here >> http://wiki.apache.org/pig/ParameterSubstitution >> >> thanks this works. > > >> Also, a note about your architecture: >> > > Are you suggesting change to the path names or your suggestion is to use > HCatalog with pig? > > >> You can consider using Hive partitions to effectively select >> appropriate dates in the folder names. But as your tool is Pig, not >> Hive, you can use HCatalog as a layer >> >> Best Regards >> >> On Tue, Sep 11, 2012 at 3:11 AM, Mohit Anchlia <[EMAIL PROTECTED]> >> wrote: >> > Our input path is something like YYYY/MM/DD/HH/input and we like to write >> > to YYYY/MM/DD/HH/output . Is it possible to get the input path as a >> String >> > and convert it to YYYY/MM/DD/HH/output that I can use in "store into" >> > clause? >> +
MiaoMiao 2012-09-13, 02:31
+
Ruslan Al-Fakikh 2012-09-13, 21:04
+
Aniket Mokashi 2012-09-15, 00:01
+
MiaoMiao 2012-09-17, 05:11
|