Tzur Turkenitz 2013-07-18, 20:40
-Re: Hive Architecture - Execution on nodes
Alan Gates 2013-07-18, 21:45
On Jul 18, 2013, at 1:40 PM, Tzur Turkenitz wrote:
> Just finished reading the Hive-Architecture pdf, and failed to find the answers I was hoping for. So here I am, hoping this community will shed some light.
> I think I know what the answers will be, I need that bolted down and secured.
> We are concerned on how data is transferred between data-nodes and hive, especially when it comes to clusters were there’s no SSL between nodes.
> And this is the user-case:
> 1. Table employee is a Hive table, with SerDe
> 2. MapReduce job accesses the table Employees which holds Encrypted data
> 3. SerDe decrypts the data
> 4. Post-SerDe output is returned to the MapReduce job and saved to a new Hive table using a new Encryption implementation
> The flow, as I think it currently is should be:
> MapReduce Job -- > Read table metadata -- > SerDe creates map-reduce job -- > distributes across nodes
> Which means that data is decrypted on the local nodes and then sent in clear-text back to the original map-reduce job to be saved in a new table.
> Is that correct? L
No. Data deserialization (which is what a serde does, not decryption) is done as part of reading in the map reduce job. Mainly only query parsing, validation, and planning is done on the client node.
Tzur Turkenitz 2013-07-22, 14:30