My parquet files are first partitioned by environment and then by date like:

env=testing/
   date=2018-03-04/
          part1.parquet
          part2.parquet
          part3.parquet
   date=2018-03-05/
          part1.parquet
          part2.parquet
          part3.parquet
   date=2018-03-06/
          part1.parquet
          part2.parquet
          part3.parquet
In our read stream, I do the following:

val tunerParquetDF = spark
      .readStream
      .schema(...)
      .format("parquet")
      .option("basePath", basePath)
      .option("path", basePath+"/env*")
      .option("maxFilesPerTrigger", 5)
      .load()

The expected behavior is that readstream will read files in order of the
dates but the observed behavior is that files are shuffled in random order.
How do I force the date order of read in Parquet files ?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB