Guillaume Allain 2013-07-01, 08:01
Edward Capriolo 2013-07-02, 23:07
-RE: Experience of Hive local mode execution style
Guillaume Allain 2013-07-04, 10:21
> Local mode really helps with those little delays.
It definately helps for small data sets. But my concerns are about consistency of results with distributed modes and some requests that fails only when it is triggered (see my description below).
From: Edward Capriolo
Sent: 03 July 2013 00:07
To: [EMAIL PROTECTED]
Subject: Re: Experience of Hive local mode execution style
Local mode is fast. In particular older version pf hadoop take a lot of time scheduling tasks and a delay betwen map and reduce phase.
Local mode really helps with those little delays.
On Monday, July 1, 2013, Guillaume Allain <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
> Hi all,
> Would anybody have any comments or feedback about the hive local mode execution? It is advertised as providing a boost to performance for small data sets. It seem to fit nicely when running unit/integration tests on single node or virtual machine.
> My exact questions are the following :
> - How significantly diverge the local mode execution of queries compared to distributed mode? Do the results may be different in some way?
> - I have had encountered error when running complex queries (with several joins/distinct/groupbys) that seem to relate to configuration (see below). I got no exact answers from the ML and I am kind of ready to dive into the source code.
> Any idea where I should aim in order to solve that particular problem?
> Thanks in advance,
> From: Guillaume Allain
> Sent: 18 June 2013 12:14
> To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
> Subject: FileNotFoundException when using hive local mode execution style
> Hi all,
> I plan to use hive local in order to speed-up unit testing on (very) small data sets. (Data is still on hdfs). I switch the local mode by setting the following variables :
> SET hive.exec.mode.local.auto=true;
> SET mapred.local.dir=/user;
> SET mapred.tmp.dir=file:///tmp;
> (plus creating needed directories and permissions)
> Simple GROUP BY, INNER and OUTER JOIN queries work just fine (with up to 3 jobs) with nice performance improvements.
> Unfortunately I ran into a FileNotFoundException:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile) on some more complex query (4 jobs, distinct on top of several joins, see below logs if needed).
> Any idea about that error? What other option I am missing to have a fully fonctional local mode?
> Thanks in advance, Guillaume
> $ tail -50 /tmp/vagrant/vagrant_20130617171313_82baad8b-1961-4055-a52e-d8865b2cd4f8.lo
> 2013-06-17 16:10:05,669 INFO exec.ExecDriver (ExecDriver.java:execute(320)) - Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
> 2013-06-17 16:10:05,688 INFO exec.ExecDriver (ExecDriver.java:execute(342)) - adding libjars: file:///opt/events-warehouse/build/jars/joda-time.jar,file:///opt/events-warehouse/build/jars/we7-hive-udfs.jar,file:///usr/lib/hive/lib/hive-json-serde-0.2.jar,file:///usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jar,file:///opt/events-warehouse/build/jars/guava.jar
> 2013-06-17 16:10:05,688 INFO exec.ExecDriver (ExecDriver.java:addInputPaths(840)) - Processing alias dc
> 2013-06-17 16:10:05,688 INFO exec.ExecDriver (ExecDriver.java:addInputPaths(858)) - Adding input file hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
> 2013-06-17 16:10:05,689 INFO exec.Utilities (Utilities.java:isEmptyPath(1807)) - Content Summary not cached for hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
> 2013-06-17 16:10:06,185 INFO exec.ExecDriver (ExecDriver.java:addInputPath(789)) - Changed input file to file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1
> 2013-06-17 16:10:06,226 INFO exec.ExecDriver (ExecDriver.java:addInputPaths(840)) - Processing alias $INTNAME
> 2013-06-17 16:10:06,226 INFO exec.ExecDriver (ExecDriver.java:addInputPaths(858)) - Adding input file hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
Senior Development Engineer
t: +44 20 7117 0809
blinkbox music - the easiest way to listen to the music you love, for free
Edward Capriolo 2013-07-04, 15:53