Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Loading data from ranges of ordered subdirs

Copy link to this message
Loading data from ranges of ordered subdirs
Let's say I have my input data from the past 12 months organized into subdirs by date:


And now say that I want to run a Pig script to process data from a range of dates within the last 12 months, say 2012-11-07 through 2013-05-26. The regex that I could specify for this date range is going to get quite complicated.

Is there a way that I can get my Pig script to load data from such a range without a regex?

I could load all the data in /data/*, and then FILTER by the date field in each record, but this is not desirable if the range of dates is small compared to the entire dataset.