Re: Application of Cloudera Hadoop for Dataset analysis

Its possible to do on VM , but its more dependent on how much data you want
to analyze and power of your system where you installed this.

Since you are CS ungergrad I would suggest you to install on plain linux
system , you can quickly pick up setup. ( after playing initially with VM )

For running SQL type of queries you can see HIVE tool
For running machine learning algo see Mahout
You can also write own custom Java code for queries.

Just start with initial install , shout here at mailing list when ever you
are struck.

Welcome to magical world of data.


Jagat Singh

On Tue, Feb 5, 2013 at 9:43 PM, Sharath Chandra Guntuku <

> Hi,
> I am Sharath Chandra, an undergraduate student at BITS-Pilani, India. I
> would like to get the following clarifications regarding cloudera hadoop
> distribution. I am using a CDH4 Demo VM for now.
> 1. After I upload the files into the file browser, if I have to link
> two-three datasets using a key in those files, what should I do? Do I have
> to run a query over them?
> 2. My objective is that I have some data collected over a few years and
> now, I would like to link all of them, as in a database using keys and then
> run queries over them to find out particular patterns. Later I would like
> to implement some Machine learning algorithms on them for predictive
> analysis. Will this be possible on the demo VM?
> I am totally new to this. Can I get some help on this? I would be very
> grateful for the same.
> Thanks and Regards,
> *Sharath Chandra Guntuku*
> Undergraduate Student (Final Year)
> *Computer Science Department*
> *BITS-Pilani*, Hyderabad Campus
> Jawahar Nagar, Shameerpet, RR Dist,
> Hyderabad - 500078, Andhra Pradesh