|
Olivier Renault
2013-01-11, 10:43
John Hancock
2013-01-11, 10:52
Michael Forage
2013-01-11, 11:22
Nitin Pawar
2013-01-11, 11:30
|
-
Re: Getting started recommendationsOlivier Renault 2013-01-11, 10:43
Hi,
Warning, I am a newby myself. Please find my answer inline. Good luck Olivier On 11 January 2013 10:29, John Lilley <[EMAIL PROTECTED]> wrote: > We are somewhat new to Hadoop and are looking to run some experiments > with HDFS, Pig, and HBase. **** > > With that in mind, I have a few questions:**** > > What is the easiest (preferably free) Hadoop distro to get started with? > Cloudera? > Cloudera is probably easy. I've gone with the solution from Hortonworks. I've used their hmc ( Hortonworks Management Console ). It's a webui which installed all the components you desired on your behalf as well as installing monitoring ( ganglia + nagios ). HMC is based on Ambari ( apache project ). You can find some information on how to install it at : http://hortonworks.com/hdp11-hmc-quick-start-guide/ > **** > > What host OS distro/release is recommended? > CentOS6 / RHEL6 seems to be a good solution. > **** > > What is the easiest environment to get started with? Amazon EC2? Is > there anyone offering virtual/hosted prebuilt Hadoop instances? > I've installed it on EC2. It worked like a charm > **** > > Where would we find some “big data” files that people have used for > testing purposes? > As part of the documentation, there is a map reduce tutorial. You can then use any files and use the wordcount examples. http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html > **** > > Feel free to RTFM me to the right place ;-)**** > > Thanks, john**** > > ** ** >
-
Re: Getting started recommendationsJohn Hancock 2013-01-11, 10:52
Olivier
Government web sites may have lots of data for working with as well. One that came to mind when I read this thread is: http://data.gov.uk/dataset/coins -John
-
RE: Getting started recommendationsMichael Forage 2013-01-11, 11:22
I am still new but had similar questions and went through a lot of pain getting started
If you want to get programming rather than spend time learning how to install, configure and administer the Hadoop tools I recommend using Amazon Elastic MapReduce. This will very quickly get you to a stage where you are able to submit and run mapreduce jobs (and pig, hive, etc...) It's a very cheap option for learning the platform, especially if you use the Ruby command-line tool which allows you to re-use your Hadoop instances for multiple jobs rather than the more expensive default of starting and stopping new clusters each time. It's got some pretty decent tutorials although (as with everything hadoop it seems) the area is so large that inevitably you'll be googling some things or asking questions here Also, I found the book "Hadoop in Action" very readable and informative, even as someone who has only sporadically used Java throughout my career. This actually takes you through different use cases based on test data downloadable from the web. Only issue is that it's written based on the older (though fully supported Hadoop 0.20) API and since it's written for someone with a local Hadoop cluster you have a small effort to translate to the Amazon EMR way of doing things. Still very useful though Cheers Mike From: John Lilley [mailto:[EMAIL PROTECTED]] Sent: 11 January 2013 10:29 To: [EMAIL PROTECTED] Subject: Getting started recommendations We are somewhat new to Hadoop and are looking to run some experiments with HDFS, Pig, and HBase. With that in mind, I have a few questions: What is the easiest (preferably free) Hadoop distro to get started with? Cloudera? What host OS distro/release is recommended? What is the easiest environment to get started with? Amazon EC2? Is there anyone offering virtual/hosted prebuilt Hadoop instances? Where would we find some "big data" files that people have used for testing purposes? Feel free to RTFM me to the right place ;-) Thanks, john
-
Re: Getting started recommendationsNitin Pawar 2013-01-11, 11:30
http://my.safaribooksonline.com/book/databases/hadoop/9780596521974
I loved this book. very well defined On Fri, Jan 11, 2013 at 3:22 AM, Michael Forage < [EMAIL PROTECTED]> wrote: > I am still new but had similar questions and went through a lot of pain > getting started**** > > ** ** > > If you want to get programming rather than spend time learning how to > install, configure and administer the Hadoop tools I recommend using Amazon > Elastic MapReduce.**** > > This will very quickly get you to a stage where you are able to submit and > run mapreduce jobs (and pig, hive, etc…)**** > > ** ** > > It’s a very cheap option for learning the platform, especially if you use > the Ruby command-line tool which allows you to re-use your Hadoop instances > for multiple jobs rather than the more expensive default of starting and > stopping new clusters each time. It’s got some pretty decent tutorials > although (as with everything hadoop it seems) the area is so large that > inevitably you’ll be googling some things or asking questions here**** > > ** ** > > Also, I found the book “Hadoop in Action” very readable and informative, > even as someone who has only sporadically used Java throughout my career. > This actually takes you through different use cases based on test data > downloadable from the web. Only issue is that it’s written based on the > older (though fully supported Hadoop 0.20) API and since it’s written for > someone with a local Hadoop cluster you have a small effort to translate to > the Amazon EMR way of doing things. Still very useful though **** > > ** ** > > Cheers**** > > Mike**** > > ** ** > > *From:* John Lilley [mailto:[EMAIL PROTECTED]] > *Sent:* 11 January 2013 10:29 > *To:* [EMAIL PROTECTED] > *Subject:* Getting started recommendations**** > > ** ** > > We are somewhat new to Hadoop and are looking to run some experiments with > HDFS, Pig, and HBase. **** > > With that in mind, I have a few questions:**** > > What is the easiest (preferably free) Hadoop distro to get started with? > Cloudera?**** > > What host OS distro/release is recommended?**** > > What is the easiest environment to get started with? Amazon EC2? Is > there anyone offering virtual/hosted prebuilt Hadoop instances?**** > > Where would we find some “big data” files that people have used for > testing purposes?**** > > Feel free to RTFM me to the right place ;-)**** > > Thanks, john**** > > ** ** > -- Nitin Pawar |