Thanks for the pointer to HiTune. The dataflow graphs in the paper looks nice.
The potential issues I can see:
1) the data collection requires a Chukwa cluster being set up. Seems
2) drill down analysis. Besides those graphs shown in the paper, can
users further drill down to the query or jobs?
It'll be nice to have some sample data available, so users can try a quick demo.
On Thu, Dec 13, 2012 at 9:12 PM, Zheng, Kai <[EMAIL PROTECTED]> wrote:
> You may have a try for HiTune & HiBench. Just google for them.
> -----Original Message-----
> From: Jie Li [mailto:[EMAIL PROTECTED]]
> Sent: Friday, December 14, 2012 10:02 AM
> To: [EMAIL PROTECTED]
> Subject: A tool to analyze and tune performance for Hive?
> Hi everyone,
> May I know if there is any tool available to analyze and tune the performance for Hive queries? And what is the state of the art?
> I had some experience on tuning Pig, based on manually clicking JT web pages and collecting pieces of data from here and there, and guessing what might be wrong. That was a slow and uncomfortable process. So before I dive into Hive, I'd like to hear any experience from you.
> PS: for individual jobs, we built a tool called Starfish:
> http://www.cs.duke.edu/starfish/release.html . It can be used to analyze the job's performance and profile the job for auto-tuning. It could be used for Hive too, but now doesn't capture the Hive-related info, as well as the interaction among jobs.