Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Encryption in HDFS

Copy link to this message
RE: Encryption in HDFS

I am also interested in your research. Can you share some insight about the following questions?
1) When you use CompressionCodec, can the encrypted file split? From my understand, there is no encrypt way can make the file decryption individually by block, right?  For example, if I have 1G file, encrypted using AES, how do you or can you decrypt the file block by block, instead of just using one mapper to decrypt the whole file?
2) In your CompressionCodec implementation, do you use the DecompressorStream or BlockDecompressorStream? If BlockDecompressorStream, can you share some examples? Right now, I have some problems to use BlockDecompressorStream to do the exactly same thing as you did.3) Do you have any plan to share your code, especially if you did use BlockDecompressorStream and make the encryption file decrypted block by block in the hadoop MapReduce job.
Date: Tue, 26 Feb 2013 14:10:08 +0900
Subject: Encryption in HDFS

Hello, I'm a university student.
I implemented AES and Triple DES with CompressionCodec in java cryptography architecture (JCA)The encryption is performed by a client node using Hadoop API.

Map tasks read blocks from HDFS and these blocks are decrypted by each map tasks.I tested my implementation with generic HDFS. My cluster consists of 3 nodes (1 master node, 3 worker nodes) and each machines have quad core processor (i7-2600) and 4GB memory.

A test input is 1TB text file which consists of 32 multiple text files (1 text file is 32GB)
I expected that the encryption takes much more time than generic HDFS. The performance does not differ significantly.

The decryption step takes about 5-7% more than generic HDFS. The encryption step takes about 20-30% more than generic HDFS because it is implemented by single thread and executed by 1 client node.

So the encryption can get more performance.
May there be any error in my test?
I know there are several implementation for encryting files in HDFS. Are these implementations enough to secure HDFS?
best regards,
* Sorry for my bad english