|
Tonci Buljan
2010-03-01, 14:01
Jeff Zhang
2010-03-01, 14:15
Mark Kerzner
2010-03-01, 14:15
Tonci Buljan
2010-03-01, 14:24
Mark Kerzner
2010-03-01, 14:28
Otis Gospodnetic
2010-03-01, 16:46
Steve Loughran
2010-03-01, 17:19
Matteo Nasi
2010-03-01, 17:30
Stephen Watt
2010-03-01, 21:16
Song Liu
2010-03-01, 21:35
Tonci Buljan
2010-03-01, 23:14
Steve Loughran
2010-03-02, 10:24
Steve Loughran
2010-03-02, 10:27
Matteo Nasi
2010-03-03, 07:28
Huy Phan
2010-03-03, 08:05
Tonci Buljan
2010-03-03, 10:16
Thomas Koch
2010-03-04, 07:44
Amund Tveit
2010-03-04, 17:11
|
-
Hadoop as master's thesisTonci Buljan 2010-03-01, 14:01
Hello everyone,
I'm thinking of using Hadoop as a subject in my master's thesis in Computer Science. I'm supposed to solve some kind of a problem with Hadoop, but can't think of any :)). We have a lab with 10-15 computers and I tough of installing Hadoop on those computers, and now I should write some kind of a program to run on my cluster. I really hope you understood my problem :). I really need any kind of suggestion. P.S. Sorry for my bad English, I'm from Croatia.
-
Re: Hadoop as master's thesisJeff Zhang 2010-03-01, 14:15
So you do not have a topic for your thesis yet ?
I think the topic depends on your background, if you have machine learning experience, I suggest you can try to use hadoop to implement some machine learning algorithms. On Mon, Mar 1, 2010 at 6:01 AM, Tonci Buljan <[EMAIL PROTECTED]> wrote: > Hello everyone, > > I'm thinking of using Hadoop as a subject in my master's thesis in > Computer > Science. I'm supposed to solve some kind of a problem with Hadoop, but > can't > think of any :)). > > We have a lab with 10-15 computers and I tough of installing Hadoop on > those computers, and now I should write some kind of a program to run on my > cluster. > > I really hope you understood my problem :). I really need any kind of > suggestion. > > > P.S. Sorry for my bad English, I'm from Croatia. > -- Best Regards Jeff Zhang
-
Re: Hadoop as master's thesisMark Kerzner 2010-03-01, 14:15
Tonci,
to start with, you can run Hadoop on one computer in pseudo-cluster mode. Installing and configuring will be enough headache on its own. Then you can think of a problem, such as process student records and grades and find some statistics, or grade and their future achievements. Or, you can look at some publicly available datasets and so something with them. Cheers, Mark On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan <[EMAIL PROTECTED]> wrote: > Hello everyone, > > I'm thinking of using Hadoop as a subject in my master's thesis in > Computer > Science. I'm supposed to solve some kind of a problem with Hadoop, but > can't > think of any :)). > > We have a lab with 10-15 computers and I tough of installing Hadoop on > those computers, and now I should write some kind of a program to run on my > cluster. > > I really hope you understood my problem :). I really need any kind of > suggestion. > > > P.S. Sorry for my bad English, I'm from Croatia. >
-
Re: Hadoop as master's thesisTonci Buljan 2010-03-01, 14:24
Thank you for your reply.
I didn't mention that I already installed Hadoop on 2 machines back at home (for a essay on Hadoop which I did), one as a namenode and datanode and one as a datanode only. Everything worked perfect. I would really try to install it on more machines to see how cluster works in more detail. So I was thinking:” Now I have a cluster, where do I find a large dataset to work with?”. I like your idea about publicly available datasets, do you have any links on that? The other idea, about student grades is also great (thank you for that) and I might just start with that. Thank you very much, you both really helped me. On 1 March 2010 15:15, Mark Kerzner <[EMAIL PROTECTED]> wrote: > Tonci, > > to start with, you can run Hadoop on one computer in pseudo-cluster mode. > Installing and configuring will be enough headache on its own. Then you can > think of a problem, such as process student records and grades and find > some > statistics, or grade and their future achievements. Or, you can look at > some > publicly available datasets and so something with them. > > Cheers, > Mark > > On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan <[EMAIL PROTECTED]> > wrote: > > > Hello everyone, > > > > I'm thinking of using Hadoop as a subject in my master's thesis in > > Computer > > Science. I'm supposed to solve some kind of a problem with Hadoop, but > > can't > > think of any :)). > > > > We have a lab with 10-15 computers and I tough of installing Hadoop on > > those computers, and now I should write some kind of a program to run on > my > > cluster. > > > > I really hope you understood my problem :). I really need any kind of > > suggestion. > > > > > > P.S. Sorry for my bad English, I'm from Croatia. > > >
-
Re: Hadoop as master's thesisMark Kerzner 2010-03-01, 14:28
Tonci,
here are Enron email files used in the litigation that they had: http://edrm.net/resources/data-sets/enron-data-set-files Here is much more stuff: http://infochimps.org/ Sincerely, Mark <http://edrm.net/resources/data-sets/enron-data-set-files> On Mon, Mar 1, 2010 at 8:24 AM, Tonci Buljan <[EMAIL PROTECTED]> wrote: > Thank you for your reply. > > > I didn't mention that I already installed Hadoop on 2 machines back at > home > (for a essay on Hadoop which I did), one as a namenode and datanode and one > as a datanode only. Everything worked perfect. I would really try to > install > it on more machines to see how cluster works in more detail. So I was > thinking:” Now I have a cluster, where do I find a large dataset to work > with?”. > > > I like your idea about publicly available datasets, do you have any links > on that? > > The other idea, about student grades is also great (thank you for that) and > I might just start with that. > > > Thank you very much, you both really helped me. > > > On 1 March 2010 15:15, Mark Kerzner <[EMAIL PROTECTED]> wrote: > > > Tonci, > > > > to start with, you can run Hadoop on one computer in pseudo-cluster mode. > > Installing and configuring will be enough headache on its own. Then you > can > > think of a problem, such as process student records and grades and find > > some > > statistics, or grade and their future achievements. Or, you can look at > > some > > publicly available datasets and so something with them. > > > > Cheers, > > Mark > > > > On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan <[EMAIL PROTECTED]> > > wrote: > > > > > Hello everyone, > > > > > > I'm thinking of using Hadoop as a subject in my master's thesis in > > > Computer > > > Science. I'm supposed to solve some kind of a problem with Hadoop, but > > > can't > > > think of any :)). > > > > > > We have a lab with 10-15 computers and I tough of installing Hadoop on > > > those computers, and now I should write some kind of a program to run > on > > my > > > cluster. > > > > > > I really hope you understood my problem :). I really need any kind of > > > suggestion. > > > > > > > > > P.S. Sorry for my bad English, I'm from Croatia. > > > > > >
-
Re: Hadoop as master's thesisOtis Gospodnetic 2010-03-01, 16:46
Bok Tonci,
You'll find good dataset pointers here: http://www.simpy.com/user/otis/search/dataset You may find inspiration for Hadoop usage here, assuming you have ML background: http://cwiki.apache.org/MAHOUT/algorithms.html Oh, and you may also want to look out for GSOC (Google Summer of Code). Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ ----- Original Message ---- > From: Tonci Buljan <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Mon, March 1, 2010 9:24:53 AM > Subject: Re: Hadoop as master's thesis > > Thank you for your reply. > > > I didn't mention that I already installed Hadoop on 2 machines back at home > (for a essay on Hadoop which I did), one as a namenode and datanode and one > as a datanode only. Everything worked perfect. I would really try to install > it on more machines to see how cluster works in more detail. So I was > thinking:�� Now I have a cluster, where do I find a large dataset to work > with?”. > > > I like your idea about publicly available datasets, do you have any links > on that? > > The other idea, about student grades is also great (thank you for that) and > I might just start with that. > > > Thank you very much, you both really helped me. > > > On 1 March 2010 15:15, Mark Kerzner wrote: > > > Tonci, > > > > to start with, you can run Hadoop on one computer in pseudo-cluster mode. > > Installing and configuring will be enough headache on its own. Then you can > > think of a problem, such as process student records and grades and find > > some > > statistics, or grade and their future achievements. Or, you can look at > > some > > publicly available datasets and so something with them. > > > > Cheers, > > Mark > > > > On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan > > wrote: > > > > > Hello everyone, > > > > > > I'm thinking of using Hadoop as a subject in my master's thesis in > > > Computer > > > Science. I'm supposed to solve some kind of a problem with Hadoop, but > > > can't > > > think of any :)). > > > > > > We have a lab with 10-15 computers and I tough of installing Hadoop on > > > those computers, and now I should write some kind of a program to run on > > my > > > cluster. > > > > > > I really hope you understood my problem :). I really need any kind of > > > suggestion. > > > > > > > > > P.S. Sorry for my bad English, I'm from Croatia. > > > > >
-
Re: Hadoop as master's thesisSteve Loughran 2010-03-01, 17:19
Tonci Buljan wrote:
> Hello everyone, > > I'm thinking of using Hadoop as a subject in my master's thesis in Computer > Science. I'm supposed to solve some kind of a problem with Hadoop, but can't > think of any :)). well, you need some interesting data, then mine it. So ask around. Physicists often have stuff.
-
Re: Hadoop as master's thesisMatteo Nasi 2010-03-01, 17:30
Hi all,
I just completed my first level of university degree at Politecnico di Milano Italy (Computer Science Engineering) with a thesis on Hadoop: "log analysis in the cloud", using and comparing custom log analysis script on local private cluster (8 nodes of old computers) and AWS EMR hadoop implementation. I wrote scripts in Pig and Hive and collected results into a custom web interface. if you're interested, feel free to ask. ciao Matteo On Mon, Mar 1, 2010 at 6:19 PM, Steve Loughran <[EMAIL PROTECTED]> wrote: > Tonci Buljan wrote: > >> Hello everyone, >> >> I'm thinking of using Hadoop as a subject in my master's thesis in >> Computer >> Science. I'm supposed to solve some kind of a problem with Hadoop, but >> can't >> think of any :)). >> > > well, you need some interesting data, then mine it. So ask around. > Physicists often have stuff. >
-
Re: Hadoop as master's thesisStephen Watt 2010-03-01, 21:16
Hi Tonci
Public Data Sets - Check out infochimps.org/ or aws.amazon.com/publicdatasets/ I find a lot of the Hadoopified algorithms out there originate from Linguistics departments, TF-IDF is one example, but, have you considered looking into Information Theory ? i.e. Entropy analytics using algorithms like Pointwise Mutual Information. I'd imagine most government security agencies would be interested in using Hadoop for signal processing/code breaking. Especially the cost savings of using commodity machines. The trick will be to find a dataset that suits your algorithm. Kind regards Steve Watt From: Tonci Buljan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Date: 03/01/2010 08:27 AM Subject: Re: Hadoop as master's thesis Thank you for your reply. I didn't mention that I already installed Hadoop on 2 machines back at home (for a essay on Hadoop which I did), one as a namenode and datanode and one as a datanode only. Everything worked perfect. I would really try to install it on more machines to see how cluster works in more detail. So I was thinking:” Now I have a cluster, where do I find a large dataset to work with?”. I like your idea about publicly available datasets, do you have any links on that? The other idea, about student grades is also great (thank you for that) and I might just start with that. Thank you very much, you both really helped me. On 1 March 2010 15:15, Mark Kerzner <[EMAIL PROTECTED]> wrote: > Tonci, > > to start with, you can run Hadoop on one computer in pseudo-cluster mode. > Installing and configuring will be enough headache on its own. Then you can > think of a problem, such as process student records and grades and find > some > statistics, or grade and their future achievements. Or, you can look at > some > publicly available datasets and so something with them. > > Cheers, > Mark > > On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan <[EMAIL PROTECTED]> > wrote: > > > Hello everyone, > > > > I'm thinking of using Hadoop as a subject in my master's thesis in > > Computer > > Science. I'm supposed to solve some kind of a problem with Hadoop, but > > can't > > think of any :)). > > > > We have a lab with 10-15 computers and I tough of installing Hadoop on > > those computers, and now I should write some kind of a program to run on > my > > cluster. > > > > I really hope you understood my problem :). I really need any kind of > > suggestion. > > > > > > P.S. Sorry for my bad English, I'm from Croatia. > > >
-
Re: Hadoop as master's thesisSong Liu 2010-03-01, 21:35
Hi, Tonci, Actually, I am taking a Master's thesis by developing algorithms
on hadoop. My project is to extend algorithms into mapreduce fasion and to discover whether there is a optimal choice. Most of them belong to the Machine Learning area. Personally, I think this is a fresh area, and if you search the main academic database, you may find few literature about this. I recently made an proposal about my study on Hadoop, and I would like to discuss this with you in depth if you wish. Another interesting topic is to discover the limit of hadoop. We have a very large cluster at a very high rank among TOP500, so I'm wondering whether hadoop can perform as we expected. Hope this helpful. Regards Song Liu On Mon, Mar 1, 2010 at 9:16 PM, Stephen Watt <[EMAIL PROTECTED]> wrote: > Hi Tonci > > Public Data Sets - Check out infochimps.org/ or > aws.amazon.com/publicdatasets/ > > I find a lot of the Hadoopified algorithms out there originate from > Linguistics departments, TF-IDF is one example, but, have you considered > looking into Information Theory ? i.e. Entropy analytics using algorithms > like Pointwise Mutual Information. I'd imagine most government security > agencies would be interested in using Hadoop for signal processing/code > breaking. Especially the cost savings of using commodity machines. The > trick will be to find a dataset that suits your algorithm. > > Kind regards > Steve Watt > > > > > From: > Tonci Buljan <[EMAIL PROTECTED]> > To: > [EMAIL PROTECTED] > Date: > 03/01/2010 08:27 AM > Subject: > Re: Hadoop as master's thesis > > > > Thank you for your reply. > > > I didn't mention that I already installed Hadoop on 2 machines back at > home > (for a essay on Hadoop which I did), one as a namenode and datanode and > one > as a datanode only. Everything worked perfect. I would really try to > install > it on more machines to see how cluster works in more detail. So I was > thinking:” Now I have a cluster, where do I find a large dataset to work > with?”. > > > I like your idea about publicly available datasets, do you have any links > on that? > > The other idea, about student grades is also great (thank you for that) > and > I might just start with that. > > > Thank you very much, you both really helped me. > > > On 1 March 2010 15:15, Mark Kerzner <[EMAIL PROTECTED]> wrote: > > > Tonci, > > > > to start with, you can run Hadoop on one computer in pseudo-cluster > mode. > > Installing and configuring will be enough headache on its own. Then you > can > > think of a problem, such as process student records and grades and find > > some > > statistics, or grade and their future achievements. Or, you can look at > > some > > publicly available datasets and so something with them. > > > > Cheers, > > Mark > > > > On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan <[EMAIL PROTECTED]> > > wrote: > > > > > Hello everyone, > > > > > > I'm thinking of using Hadoop as a subject in my master's thesis in > > > Computer > > > Science. I'm supposed to solve some kind of a problem with Hadoop, but > > > can't > > > think of any :)). > > > > > > We have a lab with 10-15 computers and I tough of installing Hadoop > on > > > those computers, and now I should write some kind of a program to run > on > > my > > > cluster. > > > > > > I really hope you understood my problem :). I really need any kind of > > > suggestion. > > > > > > > > > P.S. Sorry for my bad English, I'm from Croatia. > > > > > > > > >
-
Re: Hadoop as master's thesisTonci Buljan 2010-03-01, 23:14
Thank you all for your reply.
Matteo, I' m definitely interested in what you did, and I would be very happy to check it out in detail. Mark Kerzner's link http://infochimps.org/was very usefull. Thank you Mark for that. I'll probably download and work with some data from there. For Marko (in Croatian) Nisam ima pojma da postoji još ljudi u Hrvatskoj koji se bave Hadoopom. Studiram na FESB-u u Splitu I cijela katedra koja se bavi distribuiranim računanjem je "tanka". Profesor nije ni znao što je Hadoop kada sam ga pitao za ideju. Java je još veći bauk, isti taj profesor ju drži tako da će bit prava borba napisat nekakav diplomski na tu temu. U svakom slučaju, hvala za odgovor. On 1 March 2010 22:35, Song Liu <[EMAIL PROTECTED]> wrote: > Hi, Tonci, Actually, I am taking a Master's thesis by developing algorithms > on hadoop. > > My project is to extend algorithms into mapreduce fasion and to discover > whether there is a optimal choice. Most of them belong to the Machine > Learning area. Personally, I think this is a fresh area, and if you search > the main academic database, you may find few literature about this. > > I recently made an proposal about my study on Hadoop, and I would like to > discuss this with you in depth if you wish. > > Another interesting topic is to discover the limit of hadoop. We have a > very > large cluster at a very high rank among TOP500, so I'm wondering whether > hadoop can perform as we expected. > > Hope this helpful. > > Regards > Song Liu > > > On Mon, Mar 1, 2010 at 9:16 PM, Stephen Watt <[EMAIL PROTECTED]> wrote: > > > Hi Tonci > > > > Public Data Sets - Check out infochimps.org/ or > > aws.amazon.com/publicdatasets/ > > > > I find a lot of the Hadoopified algorithms out there originate from > > Linguistics departments, TF-IDF is one example, but, have you considered > > looking into Information Theory ? i.e. Entropy analytics using algorithms > > like Pointwise Mutual Information. I'd imagine most government security > > agencies would be interested in using Hadoop for signal processing/code > > breaking. Especially the cost savings of using commodity machines. The > > trick will be to find a dataset that suits your algorithm. > > > > Kind regards > > Steve Watt > > > > > > > > > > From: > > Tonci Buljan <[EMAIL PROTECTED]> > > To: > > [EMAIL PROTECTED] > > Date: > > 03/01/2010 08:27 AM > > Subject: > > Re: Hadoop as master's thesis > > > > > > > > Thank you for your reply. > > > > > > I didn't mention that I already installed Hadoop on 2 machines back at > > home > > (for a essay on Hadoop which I did), one as a namenode and datanode and > > one > > as a datanode only. Everything worked perfect. I would really try to > > install > > it on more machines to see how cluster works in more detail. So I was > > thinking:" Now I have a cluster, where do I find a large dataset to work > > with?". > > > > > > I like your idea about publicly available datasets, do you have any > links > > on that? > > > > The other idea, about student grades is also great (thank you for that) > > and > > I might just start with that. > > > > > > Thank you very much, you both really helped me. > > > > > > On 1 March 2010 15:15, Mark Kerzner <[EMAIL PROTECTED]> wrote: > > > > > Tonci, > > > > > > to start with, you can run Hadoop on one computer in pseudo-cluster > > mode. > > > Installing and configuring will be enough headache on its own. Then you > > can > > > think of a problem, such as process student records and grades and find > > > some > > > statistics, or grade and their future achievements. Or, you can look at > > > some > > > publicly available datasets and so something with them. > > > > > > Cheers, > > > Mark > > > > > > On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan <[EMAIL PROTECTED]> > > > wrote: > > > > > > > Hello everyone, > > > > > > > > I'm thinking of using Hadoop as a subject in my master's thesis in > > > > Computer > > > > Science. I'm supposed to solve some kind of a problem with Hadoop,
-
Re: Hadoop as master's thesisSteve Loughran 2010-03-02, 10:24
Matteo Nasi wrote:
> Hi all, > I just completed my first level of university degree at Politecnico di > Milano Italy (Computer Science Engineering) with a thesis on Hadoop: "log > analysis in the cloud", using and comparing custom log analysis script on > local private cluster (8 nodes of old computers) and AWS EMR hadoop > implementation. I wrote scripts in Pig and Hive and collected results into a > custom web interface. Stick it up online and we will link to it from the Hadoop wiki
-
Re: Hadoop as master's thesisSteve Loughran 2010-03-02, 10:27
Song Liu wrote:
> Hi, Tonci, Actually, I am taking a Master's thesis by developing algorithms > on hadoop. > > My project is to extend algorithms into mapreduce fasion and to discover > whether there is a optimal choice. Most of them belong to the Machine > Learning area. Personally, I think this is a fresh area, and if you search > the main academic database, you may find few literature about this. > > I recently made an proposal about my study on Hadoop, and I would like to > discuss this with you in depth if you wish. > > Another interesting topic is to discover the limit of hadoop. We have a very > large cluster at a very high rank among TOP500, so I'm wondering whether > hadoop can perform as we expected. > A lot of the big clusters have premium network infrastructure and SAN mounted storage whose access times are independent of location. MapReduce is designed to work on lower-cost storage/network infrastructure, saving money there that you can spend on more servers and storage. But it does require algorithms to work on local data only, or the LAN becomes a bottleneck, fast.
-
Re: Hadoop as master's thesisMatteo Nasi 2010-03-03, 07:28
hi guys,
sorry for the delay, it's a busy week :-) There's no problem about sharing my work. However there are some issue to consider: - my final doc is a 135 page description of what I did, but it's written in Italian ... So what I can try to do, is to share a sort of english abstract of each chapter, let's say half a page for all 9 chapters and 4 appendix and include bibliography and sitography; - the web site UI is an intranet site, I can try to make a description of this, maybe with some screenshots - finally I can share all scripts, and jsp code for the web part If you agree I'll work on this by the end of the week. let me know, ciao Matteo On Tue, Mar 2, 2010 at 11:24 AM, Steve Loughran <[EMAIL PROTECTED]> wrote: > Matteo Nasi wrote: > >> Hi all, >> I just completed my first level of university degree at Politecnico di >> Milano Italy (Computer Science Engineering) with a thesis on Hadoop: "log >> analysis in the cloud", using and comparing custom log analysis script on >> local private cluster (8 nodes of old computers) and AWS EMR hadoop >> implementation. I wrote scripts in Pig and Hive and collected results into >> a >> custom web interface. >> > > Stick it up online and we will link to it from the Hadoop wiki > > >
-
Re: Hadoop as master's thesisHuy Phan 2010-03-03, 08:05
Hi Matteo,
it sounds good :) We will wait for your work. On 03/03/2010 02:28 PM, Matteo Nasi wrote: > hi guys, > sorry for the delay, it's a busy week :-) There's no problem about sharing > my work. > However there are some issue to consider: > - my final doc is a 135 page description of what I did, but it's written in > Italian ... So what I can try to do, is to share a sort of english abstract > of each chapter, let's say half a page for all 9 chapters and 4 appendix and > include bibliography and sitography; > - the web site UI is an intranet site, I can try to make a description of > this, maybe with some screenshots > - finally I can share all scripts, and jsp code for the web part > If you agree I'll work on this by the end of the week. > let me know, > > ciao Matteo > > On Tue, Mar 2, 2010 at 11:24 AM, Steve Loughran<[EMAIL PROTECTED]> wrote: > > >> Matteo Nasi wrote: >> >> >>> Hi all, >>> I just completed my first level of university degree at Politecnico di >>> Milano Italy (Computer Science Engineering) with a thesis on Hadoop: "log >>> analysis in the cloud", using and comparing custom log analysis script on >>> local private cluster (8 nodes of old computers) and AWS EMR hadoop >>> implementation. I wrote scripts in Pig and Hive and collected results into >>> a >>> custom web interface. >>> >>> >> Stick it up online and we will link to it from the Hadoop wiki >> >> >> >> >
-
Re: Hadoop as master's thesisTonci Buljan 2010-03-03, 10:16
Sounds great!!!
Thank you. On 3 March 2010 09:05, Huy Phan <[EMAIL PROTECTED]> wrote: > Hi Matteo, > it sounds good :) > We will wait for your work. > > > On 03/03/2010 02:28 PM, Matteo Nasi wrote: > >> hi guys, >> sorry for the delay, it's a busy week :-) There's no problem about sharing >> my work. >> However there are some issue to consider: >> - my final doc is a 135 page description of what I did, but it's written >> in >> Italian ... So what I can try to do, is to share a sort of english >> abstract >> of each chapter, let's say half a page for all 9 chapters and 4 appendix >> and >> include bibliography and sitography; >> - the web site UI is an intranet site, I can try to make a description of >> this, maybe with some screenshots >> - finally I can share all scripts, and jsp code for the web part >> If you agree I'll work on this by the end of the week. >> let me know, >> >> ciao Matteo >> >> On Tue, Mar 2, 2010 at 11:24 AM, Steve Loughran<[EMAIL PROTECTED]> >> wrote: >> >> >> >>> Matteo Nasi wrote: >>> >>> >>> >>>> Hi all, >>>> I just completed my first level of university degree at Politecnico di >>>> Milano Italy (Computer Science Engineering) with a thesis on Hadoop: >>>> "log >>>> analysis in the cloud", using and comparing custom log analysis script >>>> on >>>> local private cluster (8 nodes of old computers) and AWS EMR hadoop >>>> implementation. I wrote scripts in Pig and Hive and collected results >>>> into >>>> a >>>> custom web interface. >>>> >>>> >>>> >>> Stick it up online and we will link to it from the Hadoop wiki >>> >>> >>> >>> >>> >> >> > >
-
Re: Hadoop as master's thesisThomas Koch 2010-03-04, 07:44
Hi Tonci,
> I'm thinking of using Hadoop as a subject in my master's thesis in > Computer Science. I'm supposed to solve some kind of a problem with > Hadoop, but can't think of any :)). I've a question, that could be a topic for a master thesis, although it's more a question about hadoop and not solving a problem with hadoop. There are thousands of organizations out there, that have a powerful but mostly underused desktop PCs standing on everyone's desk. Now if an organization has 10-50 of those PCs and would install hadoop on every desktop PC, what could this be useful for? I already have at least two ideas: - a distributed, encrypted backup system on top of HDFS - an automated knowledge system - crawl all websites and linked websites bookmarked by users - build an index on these sites - make this index searchable for all users - include public, internal documents Some questions for the task of the master thesis could be - Is it possible, to run hadoop in such an environment? - What are the drawbacks? - What is missing in hadoop to make it possible? - What is the ecological impact to use desktop PCs in this way, if this could substitute the use of some servers in a datacenter? (The heat emmitted by the PCs is even useful in winter.) Best regards, Thomas Koch, http://www.koch.ro P.s. Do you know, that next year's Debian conference will be in Bosnia? http://wiki.debconf.org/wiki/DebConf11/BanjaLuka
-
Re: Hadoop as master's thesisAmund Tveit 2010-03-04, 17:11
On Mon, Mar 1, 2010 at 3:01 PM, Tonci Buljan <[EMAIL PROTECTED]> wrote:
> Hello everyone, > > I'm thinking of using Hadoop as a subject in my master's thesis in Computer > Science. I'm supposed to solve some kind of a problem with Hadoop, but can't > think of any :)). Here is an overview of hadoop/mapreduce algorithms that might be of inspiration when finding a problem to solve: http://atbrox.com/2010/02/12/mapreduce-hadoop-algorithms-in-academic-papers-updated/ A new dataset related to machine learning: http://learningtorankchallenge.yahoo.com/ Best regards, Amund > > We have a lab with 10-15 computers and I tough of installing Hadoop on > those computers, and now I should write some kind of a program to run on my > cluster. > > I really hope you understood my problem :). I really need any kind of > suggestion. > > > P.S. Sorry for my bad English, I'm from Croatia. > -- http://atbrox.com - +47 416 26 572 |