Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> RE: Question related to Decompressor interface


Copy link to this message
-
RE: Question related to Decompressor interface
In the EncryptedWritableWrapper idea you would create an object that takes
any Writable object as it's parameter.

 

Your EncryptedWritableWrapper would naturally implement Writable.

 

.         When write(DataOutput out) is called on your object, create your
own DataOutputStream which reads data into a byte array that you control
(i.e. new DataOutputStream(new myByteArrayOutputStream()), keeping
references to the objects of course).

.         Now encrypt the bytes and pass them on to the DataOutput object
you received in write(DataOutput out)

 

To decrypt is basically the same with the readFields(DataInput in) method.

.         Read in the bytes and decrypt them (you will probably have needed
to write out the length of bytes previously so you know how much to read
in).

.         Take the decrypted bytes and pass them to the readFields(.) method
of the Writable object you're wrapping

 

The rest of Hadoop doesn't know or care if the data is encrypted, your
Writable objects are just a bunch of bytes, you're Key and Value class in
this case are now EncryptedWritableWrapper, and you'll need to know which
type of Writable to pass it in the code.

 

This would be good for encrypting in Hadoop. If your file comes in encrypted
then it necessarily can't be split (you should aim to limit the maximum size
of the file on the source side). In the case of an encrypted input you would
need your own record reader to decrypt it, your description of the scenario
below is correct, extending TextinputFormat would be the way to go.

 

If your input is just a plain text file and your goal is to store it in an
encrypted fashion then the EncryptedWritable idea works and is a more simple
implementation.

 

 

 

From: java8964 java8964 [mailto:[EMAIL PROTECTED]]
Sent: Sunday, February 10, 2013 10:13 PM
To: [EMAIL PROTECTED]
Subject: RE: Question related to Decompressor interface

 

Hi, Dave:

 

Thanks for you reply. I am not sure how the EncryptedWritable will work, can
you share more ideas about it?

 

For example, if I have a text file as my source raw file. Now I need to
store it in HDFS. If I use any encryption to encrypt the whole file, then
there is no good InputFormat or RecordReader to process it, unless whole
file is decrypted first at runtime, then using TextInputFormat to process
it, right?

 

What you suggest is  when I encrypted the file, store it as a SequenceFile,
using anything I want as the key, then encrypt each line (Record), and
stores it as the value, put both (key, value) pair into the sequence file,
is that right?

 

Then in the runtime, each value can be decrypted from the sequence file, and
ready for next step in the by the EncryptedWritable class. Is my
understanding correct?

 

 In this case, of course I don't need to worry about split any more, as each
record is encrypted/decrypted separately.

 

I think it is a valid option, but problem is that the data has to be
encrypted by this EncryptedWritable class. What I was thinking about is
allow data source to encrypt its data any way they want, as long as it is
supported by Java security package, then only provide the private key to the
runtime to decrypt it.

 

Yong

  _____  

From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: RE: Question related to Decompressor interface
Date: Sun, 10 Feb 2013 09:36:40 +0700

I can't answer your question about the Decompressor interface, but I have a
query for you.

 

Why not just create an EncryptedWritable object? Encrypt/decrypt the bytes
on the read/write method, that should be darn near trivial. Then stick with
good 'ol SequenceFile, which, as you note, is splittable. Otherwise you'd
have to deal with making the output splittable, and given encrypted data,
the only solution that I see is basically rolling your own SequenceFile with
encrypted innards.

 

Come to think of it, a simple, standardized EncryptedWritable object out of
the box with Hadoop would be great. Or perhaps better yet, an
EncryptedWritableWrapper<T extends Writable> so we can convert any existing
Writable into an encrypted form.

 

Dave

 

 

From: java8964 java8964 [mailto:[EMAIL PROTECTED]]
Sent: Sunday, February 10, 2013 3:50 AM
To: [EMAIL PROTECTED]
Subject: Question related to Decompressor interface

 

HI,

 

Currently I am researching about options of encrypting the data in the
MapReduce, as we plan to use the Amazon EMR or EC2 services for our data.

 

I am thinking that the compression codec is good place to integrate with the
encryption logic, and I found out there are some people having the same idea
as mine.

 

I google around and found out this code:

 

https://github.com/geisbruch/HadoopCryptoCompressor/

 

It doesn't seem maintained any more, but it gave me a starting point. I
download the source code, and try to do some tests with it.

 

It doesn't work out of box. There are some bugs I have to fix to make it
work. I believe it contains 'AES' as an example algorithm.

 

But right now, I faced a problem when I tried to use it in my testing
MapReduer program. Here is the stack trace I got:

 

2013-02-08 23:16:47,038 INFO
org.apache.hadoop.io.compress.crypto.CryptoBasicDecompressor: buf length 512, and offset = 0, length = -132967308

java.lang.IndexOutOfBoundsException

    at java.nio.ByteBuffer.wrap(ByteBuffer.java:352)

    at
org.apache.hadoop.io.compress.crypto.CryptoBasicDecompressor.setInput(Crypto
BasicDecompressor.java:100)

    at
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecomp
ressorStream.java:97)

    at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.jav
a:83)

    at java.io.InputStream.read(InputStream.java:82)

    at
org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209)

    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173)

    at
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineReco
rdReader.java:114)

    at
org.apach
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB