-Re: Encrypting files in Hadoop - Using the io.compression.codecs
Michael Segel 2012-08-07, 12:40
There is a bit of a difference between encryption and compression.
You're better off using coprocessors to encrypt the data as its being written than trying to encrypt the actual HFile.
On Aug 7, 2012, at 3:31 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> I do not know of a way to plug in a codec that applies to all files on
> HDFS transparently yet. Check out
> https://issues.apache.org/jira/browse/HDFS-2542 and friends for some
> work that may arrive in future.
> For HBase, by default, your choices are limited. You get only what
> HBase has tested to offer (None, LZO, GZ, Snappy) and adding in
> support for a new codec requires modification of sources. This is
> cause HBase uses an Enum of codec identifiers (to save space in its
> HFiles). But yes it can be done, and there're hackier ways of doing
> this too (Renaming your CryptoCodec to SnappyCodec for instance, to
> have HBase unknowingly use it, ugly ugly ugly).
> So yes, it is indeed best to discuss this need with the HBase
> community than the Hadoop one here.
> On Tue, Aug 7, 2012 at 1:43 PM, Farrokh Shahriari
> <[EMAIL PROTECTED]> wrote:
>> What if I want to use this encryption in a cluster with hbase running on top
>> of hadoop? Can't hadoop be configured to automatically encrypt each file
>> which is going to be written on it?
>> If not I probably should be asking how to enable encryption on hbase, and
>> asking this question on the hbase mailing list, right?
>> On Tue, Aug 7, 2012 at 12:32 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>> The codec org.apache.hadoop.io.compress.crypto.CyptoCodec needs to be
>>> used. What you've done so far is merely add it to be loaded by Hadoop
>>> at runtime, but you will need to use it in your programs if you wish
>>> for it to get applied.
>>> For example, for MapReduce outputs to be compressed, you may run an MR
>>> job with the following option set on its configuration:
>>> And then you can notice that your output files were all properly
>>> encrypted with the above codec.
>>> Likewise, if you're using direct HDFS writes, you will need to wrap
>>> your outputstream with this codec. Look at the CompressionCodec API to
>>> see how:
>>> (Where your CompressionCodec must be the
>>> org.apache.hadoop.io.compress.crypto.CyptoCodec instance).
>>> On Tue, Aug 7, 2012 at 1:11 PM, Farrokh Shahriari
>>> <[EMAIL PROTECTED]> wrote:
>>>> I use "Hadoop Crypto Compressor" from this
>>>> site"https://github.com/geisbruch/HadoopCryptoCompressor" for encryption
>>>> hdfs files.
>>>> I've downloaded the complete code & create the jar file,Change the
>>>> propertise in core-site.xml as the site says.
>>>> But when I add a new file,nothing has happened & encryption isn't
>>>> What can I do for encryption hdfs files ? Does anyone know how I should
>>>> use this class ?
>>> Harsh J
> Harsh J