Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Re: Is my Use Case possible with Hive?


Copy link to this message
-
Re: Is my Use Case possible with Hive?
with a 10 node cluster the performance should improve.
how many maps and reducers are being launched?
On Mon, May 14, 2012 at 1:18 PM, Bhavesh Shah <[EMAIL PROTECTED]>wrote:

> I have near about 1 billion records in my relational database.
> Currently locally I am using just one cluster. But I also tried this on
> Amazon Elastic Mapreduce with 10 nodes. But the time taken to execute the
> complete program is same as that on my  single local machine.
>
>
> On Mon, May 14, 2012 at 1:13 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>
>> how many # records?
>>
>> what is your hadoop cluster setup? how many nodes?
>> if you are running hadoop on a single node setup with normal desktop, i
>> doubt it will be of any help.
>>
>> You need a stronger cluster setup for better query runtimes and ofcourse
>> query optimization which I guess you would have already taken care.
>>
>>
>>
>> On Mon, May 14, 2012 at 12:39 PM, Bhavesh Shah <[EMAIL PROTECTED]>wrote:
>>
>>> Hello all,
>>> My Use Case is:
>>> 1) I have a relational database which has a very large data. (MS SQL
>>> Server)
>>> 2) I want to do analysis on these huge data  and want to generate reports
>>> on it after analysis.
>>> Like this I have to generate various reports based on different analysis.
>>>
>>> I tried to implement this using Hive. What I did is:
>>> 1) I imported all tables in Hive from MS SQL Server using SQOOP.
>>> 2) I wrote many queries in Hive which is executing using JDBC on Hive
>>> Thrift Server
>>> 3) I am getting the correct result in table form, which I am expecting
>>> 4) But the problem is that the time which require to execute is too much
>>> long.
>>>    (My complete program is executing in near about 3-4 hours on *small
>>> amount of data*).
>>>
>>>
>>>    I decided to do this using Hive.
>>>     And as I told previously how much time Hive consumed for execution.
>>> my
>>> organization is expecting to complete this task in near about less than
>>> 1/2 hours
>>>
>>> Now after spending too much time for complete execution for this task
>>> what
>>> should I do?
>>> I want to ask one thing that:
>>> *Is this Use Case is possible with Hive?* If possible what should I do in
>>>
>>> my program to increase the performance?
>>> *And If not possible what is the other good way to implement this Use
>>> Case?*
>>>
>>>
>>> Please reply me.
>>> Thanks
>>>
>>>
>>> --
>>> Regards,
>>> Bhavesh Shah
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>>
>
>
> --
> Regards,
> Bhavesh Shah
>
>
--
Nitin Pawar
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB