Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: The NCDC Weather Data for Hadoop the Definitive Guide


+
Andy Doddington 2012-02-12, 08:35
+
Andy Doddington 2012-02-13, 10:13
+
Sujit Dhamale 2012-11-16, 07:31
Copy link to this message
-
Re: The NCDC Weather Data for Hadoop the Definitive Guide
To avoid creation of recursively folder follow below steps
1. Create one Folder in your Local drive
  i created as "*/home/sujit/Desktop/Data/*"

2. Create below script and run

for i in {1901..2012}
do
cd */home/sujit/Desktop/Data/*
wget -r --no-parent --reject "index.html*"
http://ftp3.ncdc.noaa.gov/pub/data/noaa/$i/
done

On Fri, Nov 16, 2012 at 1:01 PM, Sujit Dhamale <[EMAIL PROTECTED]>wrote:

> Hi,
> If Needed you can run Below Script for Storing Data on your Local System
>
> for i in {1901..2012}
> do
> cd /home/ubuntu/work/
> wget -r -np -nH .cut-dirs=3 -R index.html
> http://ftp3.ncdc.noaa.gov/pub/data/noaa/$i/
> cd pub/data/noaa/$i/
> cp *.gz /home/ubuntu/work/files
> cd /home/ubuntu/work/
> rm -r pub/
> done
>
>
>
> On Mon, Feb 13, 2012 at 3:43 PM, Andy Doddington <[EMAIL PROTECTED]>wrote:
>
>> OK, well for starters, I think you can safely ignore the PDF data; to
>> paraphrase Star Wars" “that isn’t the data
>> in which you are interested”.
>>
>> Page 16 of the book describes the data format and refers to a data store
>> that contains directories for each year from
>> 1901 to 2001. It also shows the naming of .gz files within a sample
>> directory (1990). The files in this directory have
>> names "010010-99999-1990.gz", "010014-99999-1990.gz",
>> "010015-99999-1990.gz", and so on…
>>
>> Referring back to the NCDC web site, at the link below (
>> http://www.ncdc.noaa.gov) and clicking on the ‘Free Data’
>> link on the left-hand side of the screen beings up a new screen, as shown
>> below:
>>
>>
>> Clicking again on the ‘Free Data’ link in the middle section of this page
>> brings up another page, listing the available
>> data sets:
>>
>>
>> As this page notes, although some of this data needs to be paid for,
>> there is at least one ‘free’ options within
>> each section. For simplicity, I went for the first one - the one labelled
>> “3505 FTP data access” - which the comment
>> says is free. I used anonymous FTP and found that this site contained
>> directories for each year from 1901 to 2012.
>> I expect the additional directories reflect the fact that time has moved
>> on since the book was written :-)
>>
>> There are also several text or pdf files that provide further information
>> on the contents of the site. I suggest you
>> read some of these to get more details. One of these is called
>> "ish-format-document.pdf" and it seems to describe
>> the document format in some detail. If you open this, you can check
>> whether it matches the formate expected by
>> the hadoop sample code. There is also a ‘software’ directory, which
>> contains various bits of code that might
>> prove useful.
>>
>> On drilling down into the directory for 1990, I get the following list of
>> files:
>>
>>
>> Which looks close enough to the the file names in the hadoop book - I’d
>> guess that these are the correct files.
>>
>> Given the passage of time, it is still possible that the file format has
>> changed to make it incompatible with the
>> hadoop code. However, it shouldn’t be that difficult to modify the code
>> to suit the new format (which is very
>> well documented, as already noted).
>>
>> Good luck!
>>
>>  Andy
>>
>> ——————————————
>>
>> On 12 Feb 2012, at 08:50, Bing Li wrote:
>>
>> Andy,
>>
>> Since there is a lot of data on the free data of the site, I cannot figure
>> out which one is the one talked in the book. Any format differences might
>> cause the source code to get exceptions. Some data is even in PDF format!
>>
>> Thanks so much!
>> Bing
>>
>> On Sun, Feb 12, 2012 at 4:35 PM, Andy Doddington <[EMAIL PROTECTED]
>> >wrote:
>>
>> According to Page 15 of the book, this data is available from the US
>>
>> National Climatic Data Center, at
>>
>> http://www.ncdc.noaa.gov. Once you get to this site, there is a menu of
>>
>> links on the left-hand side of the
>>
>> page, listed under the heading ‘Data & Products’. I suspect that the entry
>>
>> labelled ‘Free Data’ is the most
>>
>> likely area you need to investigate :-)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB