Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> RE: Question related to Decompressor interface


Copy link to this message
-
Re: Question related to Decompressor interface
All of these suggestions tend to founder on the problem of key management.

What you need to do is

1) define your threats.

2) define your architecture including key management.

3) demonstrate how the architecture defends against the threat environment.

I haven't seen more than a cursory comment about item (1) in this thread.
 Typically, the threats include

a) compromise or theft of physical media by outsiders

b) compromise of one or more live machines in the cluster

c) insider attack by one employee working alone, but able to socially
engineer others into unwitting cooperation

Which threats did the OP need to defend against?
On Sun, Feb 10, 2013 at 8:24 PM, David Parks <[EMAIL PROTECTED]> wrote:

> In the EncryptedWritableWrapper idea you would create an object that takes
> any Writable object as it’s parameter. ****
>
> ** **
>
> Your EncryptedWritableWrapper would naturally implement Writable.****
>
> ** **
>
> **·         **When write(DataOutput out) is called on your object, create
> your own DataOutputStream which reads data into a byte array that you
> control (i.e. new DataOutputStream(new myByteArrayOutputStream()), keeping
> references to the objects of course).****
>
> **·         **Now encrypt the bytes and pass them on to the DataOutput
> object you received in write(DataOutput out)****
>
> ** **
>
> To decrypt is basically the same with the readFields(DataInput in) method.
> ****
>
> **·         **Read in the bytes and decrypt them (you will probably have
> needed to write out the length of bytes previously so you know how much to
> read in).****
>
> **·         **Take the decrypted bytes and pass them to the readFields(…)
> method of the Writable object you’re wrapping ****
>
> ** **
>
> The rest of Hadoop doesn’t know or care if the data is encrypted, your
> Writable objects are just a bunch of bytes, you’re Key and Value class in
> this case are now EncryptedWritableWrapper, and you’ll need to know which
> type of Writable to pass it in the code.****
>
> ** **
>
> This would be good for encrypting in Hadoop. If your file comes in
> encrypted then it necessarily can’t be split (you should aim to limit the
> maximum size of the file on the source side). In the case of an encrypted
> input you would need your own record reader to decrypt it, your description
> of the scenario below is correct, extending TextinputFormat would be the
> way to go.****
>
> ** **
>
> If your input is just a plain text file and your goal is to store it in an
> encrypted fashion then the EncryptedWritable idea works and is a more
> simple implementation.****
>
> ** **
>
> ** **
>
> ** **
>
> *From:* java8964 java8964 [mailto:[EMAIL PROTECTED]]
> *Sent:* Sunday, February 10, 2013 10:13 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* RE: Question related to Decompressor interface****
>
> ** **
>
> Hi, Dave:****
>
> ** **
>
> Thanks for you reply. I am not sure how the EncryptedWritable will work,
> can you share more ideas about it?****
>
> ** **
>
> For example, if I have a text file as my source raw file. Now I need to
> store it in HDFS. If I use any encryption to encrypt the whole file, then
> there is no good InputFormat or RecordReader to process it, unless whole
> file is decrypted first at runtime, then using TextInputFormat to process
> it, right?****
>
> ** **
>
> What you suggest is  when I encrypted the file, store it as a
> SequenceFile, using anything I want as the key, then encrypt each line
> (Record), and stores it as the value, put both (key, value) pair into the
> sequence file, is that right? ****
>
> ** **
>
> Then in the runtime, each value can be decrypted from the sequence file,
> and ready for next step in the by the EncryptedWritable class. Is my
> understanding correct?****
>
> ** **
>
>  In this case, of course I don't need to worry about split any more, as
> each record is encrypted/decrypted separately.****
>
> ** **
>
> I think it is a valid option, but problem is that the data has to be
> encrypted by this EncryptedWritable class. What I was thinking about is
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB