Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: The NCDC Weather Data for Hadoop the Definitive Guide


+
Andy Doddington 2012-02-12, 08:35
+
Andy Doddington 2012-02-13, 10:13
Copy link to this message
-
Re: The NCDC Weather Data for Hadoop the Definitive Guide
Hi,
If Needed you can run Below Script for Storing Data on your Local System

for i in {1901..2012}
do
cd /home/ubuntu/work/
wget -r -np -nH .cut-dirs=3 -R index.html
http://ftp3.ncdc.noaa.gov/pub/data/noaa/$i/
cd pub/data/noaa/$i/
cp *.gz /home/ubuntu/work/files
cd /home/ubuntu/work/
rm -r pub/
done

On Mon, Feb 13, 2012 at 3:43 PM, Andy Doddington <[EMAIL PROTECTED]>wrote:

> OK, well for starters, I think you can safely ignore the PDF data; to
> paraphrase Star Wars" “that isn’t the data
> in which you are interested”.
>
> Page 16 of the book describes the data format and refers to a data store
> that contains directories for each year from
> 1901 to 2001. It also shows the naming of .gz files within a sample
> directory (1990). The files in this directory have
> names "010010-99999-1990.gz", "010014-99999-1990.gz",
> "010015-99999-1990.gz", and so on…
>
> Referring back to the NCDC web site, at the link below (
> http://www.ncdc.noaa.gov) and clicking on the ‘Free Data’
> link on the left-hand side of the screen beings up a new screen, as shown
> below:
>
>
> Clicking again on the ‘Free Data’ link in the middle section of this page
> brings up another page, listing the available
> data sets:
>
>
> As this page notes, although some of this data needs to be paid for, there
> is at least one ‘free’ options within
> each section. For simplicity, I went for the first one - the one labelled
> “3505 FTP data access” - which the comment
> says is free. I used anonymous FTP and found that this site contained
> directories for each year from 1901 to 2012.
> I expect the additional directories reflect the fact that time has moved
> on since the book was written :-)
>
> There are also several text or pdf files that provide further information
> on the contents of the site. I suggest you
> read some of these to get more details. One of these is called
> "ish-format-document.pdf" and it seems to describe
> the document format in some detail. If you open this, you can check
> whether it matches the formate expected by
> the hadoop sample code. There is also a ‘software’ directory, which
> contains various bits of code that might
> prove useful.
>
> On drilling down into the directory for 1990, I get the following list of
> files:
>
>
> Which looks close enough to the the file names in the hadoop book - I’d
> guess that these are the correct files.
>
> Given the passage of time, it is still possible that the file format has
> changed to make it incompatible with the
> hadoop code. However, it shouldn’t be that difficult to modify the code to
> suit the new format (which is very
> well documented, as already noted).
>
> Good luck!
>
> Andy
>
> ——————————————
>
> On 12 Feb 2012, at 08:50, Bing Li wrote:
>
> Andy,
>
> Since there is a lot of data on the free data of the site, I cannot figure
> out which one is the one talked in the book. Any format differences might
> cause the source code to get exceptions. Some data is even in PDF format!
>
> Thanks so much!
> Bing
>
> On Sun, Feb 12, 2012 at 4:35 PM, Andy Doddington <[EMAIL PROTECTED]
> >wrote:
>
> According to Page 15 of the book, this data is available from the US
>
> National Climatic Data Center, at
>
> http://www.ncdc.noaa.gov. Once you get to this site, there is a menu of
>
> links on the left-hand side of the
>
> page, listed under the heading ‘Data & Products’. I suspect that the entry
>
> labelled ‘Free Data’ is the most
>
> likely area you need to investigate :-)
>
>
> Good Luck
>
>
> Andy D
>
>
> ————————————————————
>
>
> On 12 Feb 2012, at 07:14, Bing Li wrote:
>
>
> Dear all,
>
>
> I am following the book, Hadoop: the Definitive Guide. However, I got
>
> stuck
>
> because I could not get the NCDC Weather data that is used by the source
>
> code in the book. The Appendix C told me I could follow some instructions
>
> in www.hadoopbook.com. But I didn't get the instructions there. Could
>
> you
>
> give me a hand?
>
>
> Thanks so much!
>
>
> Best regards,
>
> Bing
>
>
>
+
Sujit Dhamale 2012-12-06, 04:08