Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Encrypting files in Hadoop - Using the io.compression.codecs


Copy link to this message
-
Re: Encrypting files in Hadoop - Using the io.compression.codecs
There is a bit of a difference between encryption and compression.

You're better off using coprocessors to encrypt the data as its being written than trying to encrypt the actual HFile.
On Aug 7, 2012, at 3:31 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> Farrokh,
>
> I do not know of a way to plug in a codec that applies to all files on
> HDFS transparently yet. Check out
> https://issues.apache.org/jira/browse/HDFS-2542 and friends for some
> work that may arrive in future.
>
> For HBase, by default, your choices are limited. You get only what
> HBase has tested to offer (None, LZO, GZ, Snappy) and adding in
> support for a new codec requires modification of sources. This is
> cause HBase uses an Enum of codec identifiers (to save space in its
> HFiles). But yes it can be done, and there're hackier ways of doing
> this too (Renaming your CryptoCodec to SnappyCodec for instance, to
> have HBase unknowingly use it, ugly ugly ugly).
> So yes, it is indeed best to discuss this need with the HBase
> community than the Hadoop one here.
>
> On Tue, Aug 7, 2012 at 1:43 PM, Farrokh Shahriari
> <[EMAIL PROTECTED]> wrote:
>> Thanks,
>> What if I want to use this encryption in a cluster with hbase running on top
>> of hadoop? Can't hadoop be configured to automatically encrypt each file
>> which is going to be written on it?
>> If not I probably should be asking how to enable encryption on hbase, and
>> asking this question on the hbase mailing list, right?
>>
>>
>> On Tue, Aug 7, 2012 at 12:32 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>
>>> Farrokh,
>>>
>>> The codec org.apache.hadoop.io.compress.crypto.CyptoCodec needs to be
>>> used. What you've done so far is merely add it to be loaded by Hadoop
>>> at runtime, but you will need to use it in your programs if you wish
>>> for it to get applied.
>>>
>>> For example, for MapReduce outputs to be compressed, you may run an MR
>>> job with the following option set on its configuration:
>>>
>>>
>>> "-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.crypto.CyptoCodec"
>>>
>>> And then you can notice that your output files were all properly
>>> encrypted with the above codec.
>>>
>>> Likewise, if you're using direct HDFS writes, you will need to wrap
>>> your outputstream with this codec. Look at the CompressionCodec API to
>>> see how:
>>> http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/io/compress/CompressionCodec.html#createOutputStream(java.io.OutputStream)
>>> (Where your CompressionCodec must be the
>>> org.apache.hadoop.io.compress.crypto.CyptoCodec instance).
>>>
>>> On Tue, Aug 7, 2012 at 1:11 PM, Farrokh Shahriari
>>> <[EMAIL PROTECTED]> wrote:
>>>>
>>>> Hello
>>>> I use "Hadoop Crypto Compressor" from this
>>>> site"https://github.com/geisbruch/HadoopCryptoCompressor" for encryption
>>>> hdfs files.
>>>> I've downloaded the complete code & create the jar file,Change the
>>>> propertise in core-site.xml as the site says.
>>>> But when I add a new file,nothing has happened & encryption isn't
>>>> working.
>>>> What can I do for encryption hdfs files ? Does anyone know how I should
>>>> use this class ?
>>>>
>>>> Tnx
>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>
>
>
> --
> Harsh J
>