-Pass one file at a time to pig STREAM command?
Leigh Stewart 2013-05-16, 00:04
I need to decode data files encoded in a proprietary binary format. In
order to be properly decoded, exactly one file must be passed to the
decoder executable per execution.
I'm experimenting with two approaches:
1. starting the process and consuming stdout manually and
2. pushing the file through pig streaming
 is fine but not as fine as  since  was designed for this general
The way I get  to sort-of work is to read each file into a tuple with
one item, and pass the tuple to the decoder binary.
The problem is that pig will concatenate the serialized tuples together,
and my decoder won't be able to decode the file properly.
Providing a PigSerializer alternative doesn't look like it will work since
it doesn't support limiting the number of tuples per file (I pass the input
using "input ('file')").
As far as I can tell this is a dead end.
Can anyone offer any suggestions or show otherwise?