-Loading data from ranges of ordered subdirs
Rodrick Megraw 2013-06-10, 20:54
Let's say I have my input data from the past 12 months organized into subdirs by date:
And now say that I want to run a Pig script to process data from a range of dates within the last 12 months, say 2012-11-07 through 2013-05-26. The regex that I could specify for this date range is going to get quite complicated.
Is there a way that I can get my Pig script to load data from such a range without a regex?
I could load all the data in /data/*, and then FILTER by the date field in each record, but this is not desirable if the range of dates is small compared to the entire dataset.