|
|
Sugandha Naolekar 2009-08-18, 07:58
Hello!
I am planning to implement a DFS of that would work on the same lines of principle as of HDFS but, with some extra features. Like as, Encryption and decryption of data that would be transferred between remote client and Hadoop cluster.
I want to encrypt data before or while placing it in HDFS and then while retreival of the same, vice versa should happen,i.e; decryption.... Can you please suggest me how to approach for the entire episode?
Regards! Sugandha
Jakob Homan 2009-08-18, 19:23
Sugandha- I would suggest you look at the FileSystem interface, which is our starting point for implementing a file system for use with Hadoop. There are several implementations, such as S3FileSystem, that you can look at for inspiration.
Jakob Homan Hadoop at Yahoo!
Sugandha Naolekar wrote: > Hello! > > I am planning to implement a DFS of that would work on the same lines of > principle as of HDFS but, with some extra features. Like as, Encryption and > decryption of data that would be transferred between remote client and > Hadoop cluster. > > I want to encrypt data before or while placing it in HDFS and then while > retreival of the same, vice versa should happen,i.e; decryption.... > > > Can you please suggest me how to approach for the entire episode? > > Regards! > Sugandha >
Sugandha Naolekar 2009-09-04, 07:46
Hello!
Running a simple MR job, and setting a replication factor of 2. Now, after its execution, the output is split in files named as part-00000 and so on. I want to ask is, can't we avoid these keys or key values to get printed in output files? I mean, I am getting the output in the files in key-value pair. I want just the data and not the keys(integers) in it. -- Regards! Sugandha
zhang jianfeng 2009-09-04, 07:55
Hi Sugandha ,
If you only want to the value, you need to set the key as NullWritable in reduce.
e.g. output.collect(NullWritable.get(), value);
On Fri, Sep 4, 2009 at 12:46 AM, Sugandha Naolekar <[EMAIL PROTECTED]>wrote:
> Hello! > > Running a simple MR job, and setting a replication factor of 2. Now, > after its execution, the output is split in files named as part-00000 and > so > on. I want to ask is, can't we avoid these keys or key values to get > printed > in output files? I mean, I am getting the output in the files in key-value > pair. I want just the data and not the keys(integers) in it. > > > > > -- > Regards! > Sugandha >
Amandeep Khurana 2009-09-04, 08:24
Or you can output the data in the keys and NullWritable as the value. That ways you'll get only unique data...
On 9/4/09, zhang jianfeng <[EMAIL PROTECTED]> wrote: > Hi Sugandha , > > If you only want to the value, you need to set the key as NullWritable in > reduce. > > e.g. > output.collect(NullWritable.get(), value); > > > > On Fri, Sep 4, 2009 at 12:46 AM, Sugandha Naolekar > <[EMAIL PROTECTED]>wrote: > >> Hello! >> >> Running a simple MR job, and setting a replication factor of 2. >> Now, >> after its execution, the output is split in files named as part-00000 and >> so >> on. I want to ask is, can't we avoid these keys or key values to get >> printed >> in output files? I mean, I am getting the output in the files in key-value >> pair. I want just the data and not the keys(integers) in it. >> >> >> >> >> -- >> Regards! >> Sugandha >> > -- Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz
bharath vissapragada 2009-09-04, 15:45
Hey ,
I have one more doubt , Suppose I have some cascading mapred jobs and suppose some data which was collected in MRjob1 is to be used in MRjob2 m is there any way?
Thanks
On Fri, Sep 4, 2009 at 1:54 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:
> Or you can output the data in the keys and NullWritable as the value. > That ways you'll get only unique data... > > On 9/4/09, zhang jianfeng <[EMAIL PROTECTED]> wrote: > > Hi Sugandha , > > > > If you only want to the value, you need to set the key as NullWritable in > > reduce. > > > > e.g. > > output.collect(NullWritable.get(), value); > > > > > > > > On Fri, Sep 4, 2009 at 12:46 AM, Sugandha Naolekar > > <[EMAIL PROTECTED]>wrote: > > > >> Hello! > >> > >> Running a simple MR job, and setting a replication factor of 2. > >> Now, > >> after its execution, the output is split in files named as part-00000 > and > >> so > >> on. I want to ask is, can't we avoid these keys or key values to get > >> printed > >> in output files? I mean, I am getting the output in the files in > key-value > >> pair. I want just the data and not the keys(integers) in it. > >> > >> > >> > >> > >> -- > >> Regards! > >> Sugandha > >> > > > > > -- > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz >
Amogh Vasekar 2009-09-04, 15:56
Have a look at jobclient, it should suffice.
Cheers! Amogh
-----Original Message----- From: bharath vissapragada [mailto:[EMAIL PROTECTED]] Sent: Friday, September 04, 2009 9:15 PM To: [EMAIL PROTECTED] Subject: Re: Some issues!
Hey ,
I have one more doubt , Suppose I have some cascading mapred jobs and suppose some data which was collected in MRjob1 is to be used in MRjob2 m is there any way?
Thanks
On Fri, Sep 4, 2009 at 1:54 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:
> Or you can output the data in the keys and NullWritable as the value. > That ways you'll get only unique data... > > On 9/4/09, zhang jianfeng <[EMAIL PROTECTED]> wrote: > > Hi Sugandha , > > > > If you only want to the value, you need to set the key as NullWritable in > > reduce. > > > > e.g. > > output.collect(NullWritable.get(), value); > > > > > > > > On Fri, Sep 4, 2009 at 12:46 AM, Sugandha Naolekar > > <[EMAIL PROTECTED]>wrote: > > > >> Hello! > >> > >> Running a simple MR job, and setting a replication factor of 2. > >> Now, > >> after its execution, the output is split in files named as part-00000 > and > >> so > >> on. I want to ask is, can't we avoid these keys or key values to get > >> printed > >> in output files? I mean, I am getting the output in the files in > key-value > >> pair. I want just the data and not the keys(integers) in it. > >> > >> > >> > >> > >> -- > >> Regards! > >> Sugandha > >> > > > > > -- > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz >
bharath vissapragada 2009-09-05, 02:29
Amogh , thanks for yout reply.
I will make my question more clear ,
Suppose I have an array and it got updated in the MRjob1 . and i want to access it in MRjob2 . This is what i intended in my previous question . I have gone through the JobConf class , but i haven't found anything useful . If Iam wrong , kindly point me to the correct methods. .
Thanks
On Fri, Sep 4, 2009 at 9:26 PM, Amogh Vasekar <[EMAIL PROTECTED]> wrote:
> Have a look at jobclient, it should suffice. > > Cheers! > Amogh > > -----Original Message----- > From: bharath vissapragada [mailto:[EMAIL PROTECTED]] > Sent: Friday, September 04, 2009 9:15 PM > To: [EMAIL PROTECTED] > Subject: Re: Some issues! > > Hey , > > I have one more doubt , Suppose I have some cascading mapred jobs and > suppose some data which was collected in > MRjob1 is to be used in MRjob2 m is there any way? > > Thanks > > On Fri, Sep 4, 2009 at 1:54 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > > > Or you can output the data in the keys and NullWritable as the value. > > That ways you'll get only unique data... > > > > On 9/4/09, zhang jianfeng <[EMAIL PROTECTED]> wrote: > > > Hi Sugandha , > > > > > > If you only want to the value, you need to set the key as NullWritable > in > > > reduce. > > > > > > e.g. > > > output.collect(NullWritable.get(), value); > > > > > > > > > > > > On Fri, Sep 4, 2009 at 12:46 AM, Sugandha Naolekar > > > <[EMAIL PROTECTED]>wrote: > > > > > >> Hello! > > >> > > >> Running a simple MR job, and setting a replication factor of 2. > > >> Now, > > >> after its execution, the output is split in files named as part-00000 > > and > > >> so > > >> on. I want to ask is, can't we avoid these keys or key values to get > > >> printed > > >> in output files? I mean, I am getting the output in the files in > > key-value > > >> pair. I want just the data and not the keys(integers) in it. > > >> > > >> > > >> > > >> > > >> -- > > >> Regards! > > >> Sugandha > > >> > > > > > > > > > -- > > > > > > Amandeep Khurana > > Computer Science Graduate Student > > University of California, Santa Cruz > > >
ll_oz_ll 2009-09-07, 00:08
Yes, you can do that. Just output null as the key in reducer and you wont get the key or the tab delimiter in your output. Sugandha Naolekar wrote: > > Hello! > > Running a simple MR job, and setting a replication factor of 2. > Now, > after its execution, the output is split in files named as part-00000 and > so > on. I want to ask is, can't we avoid these keys or key values to get > printed > in output files? I mean, I am getting the output in the files in key-value > pair. I want just the data and not the keys(integers) in it. > > > > > -- > Regards! > Sugandha > > -- View this message in context: http://www.nabble.com/Some-issues%21-tp25289798p25323434.htmlSent from the Hadoop core-user mailing list archive at Nabble.com.
|
|