Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Getting Slow Query Performance!


Copy link to this message
-
RE: Getting Slow Query Performance!
Bennie Schut 2013-03-12, 11:30
Well it's probably worth  to know 30G is really hitting rock bottom when you talk about big data. Hadoop is linearly scalable so probably going to 3 or 4 similar machines could get you below the mysql time but it's hardly a fair comparison.
Setting it up I would suggest reading the hadoop docs: http://hadoop.apache.org/docs/current/
These hardware specs give you an idea why it's an unusual case: http://hortonworks.com/blog/best-practices-for-selecting-apache-hadoop-hardware/

To give you some hints. Each node needs to be configure on how much resources it's allowed to take. This is a balance between several parameters:
mapred.tasktracker.map.tasks.maximum, mapred.tasktracker.reduce.tasks.maximum, mapred.child.java.opts
There are tons more configurations but this is where you start. Different hardware and different jobs require different configurations so try it out.
Since you are extremely tight on ram you probably want to reduce memory usage on most processes like the namenode/jobtracker/hive and on each node drop the memory requirements for tasktracker/datanode.
Also don't put your nodes on 100MB links they are almost always to slow.

Bennie.

From: Gobinda Paul [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, March 12, 2013 11:01 AM
To: [EMAIL PROTECTED]
Subject: RE: Getting Slow Query Performance!
Thnx for your reply , i am new to hadoop and hive .My goal is to process a big data using hadoop,
this is my university project ( Data Mining ) ,need to show that hadoop is better than mysql in case
of Big data(30-100GB+) Processing,i know hadoop does that.To do so,can you please suggest me,
how many node is required to show the performance  and what type of configuration is required for each node.
From: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
CC: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Date: Tue, 12 Mar 2013 10:40:33 +0100
Subject: RE: Getting Slow Query Performance!
Generally a single hadoop machine will perform worse then a single mysql machine. People normally use hadoop when they have so much data it won't really fit on a single machine and it would require specialized hardware (Stuff like SAN's) to run.
30GB of data really isn't that much and 2GB of ram is really not what hadoop is designed to work on. It really likes to have lots of memory.
I also don't see the hadoop configuration files so perhaps you only have 1 mapper and 1 reducer. But this is not a typical use-case so I doubt you'll see snappy performance after tweaking the configs.