May I know if there is any tool available to analyze and tune the
performance for Hive queries? And what is the state of the art?
I had some experience on tuning Pig, based on manually clicking JT web
pages and collecting pieces of data from here and there, and guessing
what might be wrong. That was a slow and uncomfortable process. So
before I dive into Hive, I'd like to hear any experience from you.
PS: for individual jobs, we built a tool called Starfish:
http://www.cs.duke.edu/starfish/release.html . It can be used to
analyze the job's performance and profile the job for auto-tuning. It
could be used for Hive too, but now doesn't capture the Hive-related
info, as well as the interaction among jobs.