The best way to answer your queries is,
1) set up a single node hadoop VM (there are readily available images from
hortonworks and cloudera)
2) try to load data and see where it is stored (hive is a data access
framework .. it does not store any data, information related to data is
stored in metastore .. mainly hcatalog)
3) With hive its just writing queries and doing numbers, there are lot of
file formats which do better with different kind of workloads.
If you have basic understanding of hive and tried few queries you will find
that hive is not a stand alone system (for now). It has hadoop mapreduce1
and hdfs then it has metastore then it has hive framework.
You will need to understand bit more of hdfs as well.
to answer your queries
how the hive will connect with hadoop cluster,
.. when you setup hive you can point it to a hadoop cluster or you can
change these properties at table level.
how the hive will get the request,
.. not sure what you mean by request .. if you mean the query then there
are ways like hive cli (as I am aware development on this is getting less),
then there are clients like beeline and then u have options of jdbc
how the hive will process the request,
.. how converts your query into an optimal mapreduce program and processes
the data using that mapreduce program. How to convert a sql query to
mapreduce program, you can look at ysmart framework from ohio university .
after analysis ,where the analyzed data will be stored for further decision
.. hive does store any data automatically. You have to specifically mention
where you want to save the data. a table or a file or something like that.
On Mon, Jan 13, 2014 at 4:14 PM, Vikas Parashar <[EMAIL PROTECTED]>wrote:
> Thanks Prashant, Definitely i shall go through that if needed. But from
> my experience, what i have faced is that user will have some integration
> problem with HADOOP 2.
> Hi Vikas
>> Welcome to the world of Hive !
>> The first book u should read is by Capriolo , Wampler, Rutherglen
>> Programming Hive
>> This is a must read. I have immensely benefited from this book and the
>> hive user group (the group is kickass).
>> If u r not sure of the details of HDFS/Hadoop then the Hadoop
>> Definitive Guide (Tom White) is a must read.
>> My view would be u should know both very well eventually...
>> I have setup Hadoop and Hive cluster in three ways
>>  manually thru tarballs (lightweight but u need to know what u r
>> installing and where)
>>  CDH & Cloudera manager (heavyweight but it does things in the
>> background....easy to install and quick to setup on a sandbox and
>> learn)...Plus Beeswax is s great starter UI for Hive queries
>>  Using Amazon EMR Hive (I realize this is the easiest and the fastest
>> to setup to learn Hive)
>> My suggestion , Don't go for option  - u learn a lot there but it
>> could take time and u might feel frustrated as well
>> using option  above , then I suggest
>> - 1 or 2 boxes - i7 quad core (or u can use a 8 core AMD FX 8300) with
>> 16-32GB RAM
>> - download and install Cloudera manager
>> If u don't have access to box(es) to install hadoop/hive then the
>> cheapest way to learn is by using Amazon EMR
>> - First create a S3 bucket and a folder to store a data file called
>> 1,2,lennon,john,nowhere man
>> 1,3,lennon,john,strawberry fields forever
>> 2,1,mccartney,paul,penny lane
>> 3,1,harrison,george,while my guitar gently weeps
>> 3,2,harrison,george,i want to tell you
>> 3,3,harrison,george,think for yourself
>> 4,1,starr,ringo,octopuss garden
>> 4,2,starr,ringo,with a liitle help from my friends
>> - Create a key pair from the AWS console and save the private key on
>> your local desktop
>> - Create a EMR cluster with Hive installed