I’ve been looking at the OrcFile.createReader method and thinking about
what I will need to do to read acid files. The first thing that strikes me
is that createReader takes a file. But for acid, you need to pass the
directory because it needs to look for any relevant delta files. Acid also
requires a ValidTxnList. We can add that to the ReaderOptions.
It seems the best way to do this is to add a new method
OrcFile.createAcidReader that takes a directory. I don’t like that the
user has to make a different call in the acid case. But the user will have
to set the ValidTxnList in the reader options anyway, so the user will
already have to have split logic.
Every way I could think of for createReader to decide if it was dealing
with an acid directory or a non-acid file seemed to create jumbled
Does the user pass a directory for the acid case but a file for non-acid?
Does the user pass a base file in the acid case and the code walks up the
path to find the relevant directory? Seems error prone and slow.
Related to this is my assumption that I will need to write a new
implementation of Reader and RecordReader that understand acid. This seems
better than putting a bunch of branches into the existing code to try to
handle both cases.