Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Re: Is my Use Case possible with Hive?


Copy link to this message
-
Re: Is my Use Case possible with Hive?
Nitin Pawar 2012-05-14, 08:37
with a 10 node cluster the performance should improve.
how many maps and reducers are being launched?
On Mon, May 14, 2012 at 1:18 PM, Bhavesh Shah <[EMAIL PROTECTED]>wrote:

> I have near about 1 billion records in my relational database.
> Currently locally I am using just one cluster. But I also tried this on
> Amazon Elastic Mapreduce with 10 nodes. But the time taken to execute the
> complete program is same as that on my  single local machine.
>
>
> On Mon, May 14, 2012 at 1:13 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>
>> how many # records?
>>
>> what is your hadoop cluster setup? how many nodes?
>> if you are running hadoop on a single node setup with normal desktop, i
>> doubt it will be of any help.
>>
>> You need a stronger cluster setup for better query runtimes and ofcourse
>> query optimization which I guess you would have already taken care.
>>
>>
>>
>> On Mon, May 14, 2012 at 12:39 PM, Bhavesh Shah <[EMAIL PROTECTED]>wrote:
>>
>>> Hello all,
>>> My Use Case is:
>>> 1) I have a relational database which has a very large data. (MS SQL
>>> Server)
>>> 2) I want to do analysis on these huge data  and want to generate reports
>>> on it after analysis.
>>> Like this I have to generate various reports based on different analysis.
>>>
>>> I tried to implement this using Hive. What I did is:
>>> 1) I imported all tables in Hive from MS SQL Server using SQOOP.
>>> 2) I wrote many queries in Hive which is executing using JDBC on Hive
>>> Thrift Server
>>> 3) I am getting the correct result in table form, which I am expecting
>>> 4) But the problem is that the time which require to execute is too much
>>> long.
>>>    (My complete program is executing in near about 3-4 hours on *small
>>> amount of data*).
>>>
>>>
>>>    I decided to do this using Hive.
>>>     And as I told previously how much time Hive consumed for execution.
>>> my
>>> organization is expecting to complete this task in near about less than
>>> 1/2 hours
>>>
>>> Now after spending too much time for complete execution for this task
>>> what
>>> should I do?
>>> I want to ask one thing that:
>>> *Is this Use Case is possible with Hive?* If possible what should I do in
>>>
>>> my program to increase the performance?
>>> *And If not possible what is the other good way to implement this Use
>>> Case?*
>>>
>>>
>>> Please reply me.
>>> Thanks
>>>
>>>
>>> --
>>> Regards,
>>> Bhavesh Shah
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>>
>
>
> --
> Regards,
> Bhavesh Shah
>
>
--
Nitin Pawar