Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> working with SAS


Copy link to this message
-
Re: working with SAS
Hi,

hadoop is running on a linux box (mostly) and can run in a standalone installation for testing only. If you decide to use hadoop with hive or hbase you have to face a lot of more tasks:

- installation (whirr and Amazone EC2 as example)
- write your own mapreduce job or use hive / hbase
- setup sqoop with the terradata-driver

You can easy setup part 1 and 2 with Amazon's EC2, I think you can also book Windows Server there. For a single query the best option I think before you install a hadoop cluster.

best,
 Alex
--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 6, 2012, at 8:11 AM, Ali Jooan Rizvi wrote:

> Hi,
>
>
>
> I would like to know if hadoop will be of help to me? Let me explain you
> guys my scenario:
>
>
>
> I have a windows server based single machine server having 16 Cores and 48
> GB of Physical Memory. In addition, I have 120 GB of virtual memory.
>
>
>
> I am running a query with statistical calculation on large data of over 1
> billion rows, on SAS. In this case, SAS is acting like a database on which
> both source and target tables are residing. For storage, I can keep the
> source and target data on Teradata as well but the query containing a patent
> can only be run on SAS interface.
>
>
>
> The problem is that SAS is taking many days (25 days) to run it (a single
> query with statistical function) and not all cores all the time were used
> and rather merely 5% CPU was utilized on average. However memory utilization
> was high, very high, and that's why large virtual memory was used.
>
>
>
> Can I have a hadoop interface in place to do it all so that I may end up
> running the query in lesser time that is in 1 or 2 days. Anything squeezing
> my run time will be very helpful.
>
>
>
> Thanks
>
>
>
> Ali Jooan Rizvi
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB