Hello, Are there plans to support Hadoop's Sequence File ( http://wiki.apache.org/hadoop/SequenceFile.) Or are they already supported and I missed it? I could see this being useful to use Drill on the output of MapReduce jobs.
The sequence files I have are currently all NULL keys and JSON objects as the value. Does anyone have a recommendation on converting to JSON files or Parquet files for Drill? The JSON objects are generally the same format, but there may be some outliers with differences. Some fields may be non-existant in some objects. Thanks, Tom
P.S. Apologies for the noob questions. I've just started looking at Drill.
Steven just submitted a patch for a Hive Serde storage engine. I believe he successfully was able to read sequence file with this technique. We will be adding a native reader in the future (for improved performance), but for now this should be a decent way to get sequence file data into drill. He currently has the patch up for review, so if you are comfortable applying a patch, building the project and trying to read some of your data we would certainly appreciate feedback. It should be merged with mainline in the near future, which would remove the need to apply the patch.
Looks like the patch accidentally includes references to a sample hive statestore derby db. You can try to rip those out of the patch or wait for Steven and/or Venki to fix the patch. On Feb 10, 2014 6:59 AM, "Tom Kiley" <[EMAIL PROTECTED]> wrote:
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext