-Re: Loading data from ranges of ordered subdirs
Pradeep Gollakota 2013-06-10, 21:02
There's two possibilites that come to mind.
1. Write a custom LoadFunc in which you can handle these regular
expressions. *Not the most ideal solution*
2. Use HCatalog. The example they have in their documentation seems to fit
your use case perfectly. (http://incubator.apache.org/hcatalog/docs/r0.5.0/
There might be other ways to do this, but I'm not aware of them.
Hope this helps.
On Mon, Jun 10, 2013 at 4:54 PM, Rodrick Megraw <[EMAIL PROTECTED]>wrote:
> Let's say I have my input data from the past 12 months organized into
> subdirs by date:
> And now say that I want to run a Pig script to process data from a range
> of dates within the last 12 months, say 2012-11-07 through 2013-05-26. The
> regex that I could specify for this date range is going to get quite
> Is there a way that I can get my Pig script to load data from such a range
> without a regex?
> I could load all the data in /data/*, and then FILTER by the date field in
> each record, but this is not desirable if the range of dates is small
> compared to the entire dataset.