Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Experience of Hive local mode execution style


Copy link to this message
-
Re: Experience of Hive local mode execution style
Local mode is fast. In particular older version pf hadoop take a lot of
time scheduling tasks and a delay betwen map and reduce phase.

Local mode really helps with those little delays.

On Monday, July 1, 2013, Guillaume Allain <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> Would anybody have any comments or feedback about the hive local mode
execution? It is advertised as providing a boost to performance for small
data sets. It seem to fit nicely when running unit/integration tests on
single node or virtual machine.
>
> My exact questions are the following :
>
> - How significantly diverge the local mode execution of queries compared
to distributed mode? Do the results may be different in some way?
>
> - I have had encountered error when running complex queries (with several
joins/distinct/groupbys) that seem to relate to configuration (see below).
I got no exact answers from the ML and I am kind of ready to dive into the
source code.
>
> Any idea where I should aim in order to solve that particular problem?
>
> Thanks in advance,
>
> Guillaume
>
> ________________________________
> From: Guillaume Allain
> Sent: 18 June 2013 12:14
> To: [EMAIL PROTECTED]
> Subject: FileNotFoundException when using hive local mode execution style
>
> Hi all,
>
> I plan to use  hive local in order to speed-up unit testing on (very)
small data sets. (Data is still on hdfs). I switch the local mode by
setting the following variables :
>
> SET hive.exec.mode.local.auto=true;
> SET mapred.local.dir=/user;
> SET mapred.tmp.dir=file:///tmp;
> (plus creating needed directories and permissions)
>
> Simple GROUP BY, INNER and OUTER JOIN queries work just fine (with up to
3 jobs) with nice performance improvements.
>
> Unfortunately I ran into a
FileNotFoundException:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile)
on some more complex query (4 jobs, distinct on top of several joins, see
below logs if needed).
>
> Any idea about that error? What other option I am missing to have a fully
fonctional local mode?
>
> Thanks in advance, Guillaume
>
> $ tail -50
/tmp/vagrant/vagrant_20130617171313_82baad8b-1961-4055-a52e-d8865b2cd4f8.lo
>
> 2013-06-17 16:10:05,669 INFO  exec.ExecDriver
(ExecDriver.java:execute(320)) - Using
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
> 2013-06-17 16:10:05,688 INFO  exec.ExecDriver
(ExecDriver.java:execute(342)) - adding libjars:
file:///opt/events-warehouse/build/jars/joda-time.jar,file:///opt/events-warehouse/build/jars/we7-hive-udfs.jar,file:///usr/lib/hive/lib/hive-json-serde-0.2.jar,file:///usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jar,file:///opt/events-warehouse/build/jars/guava.jar
> 2013-06-17 16:10:05,688 INFO  exec.ExecDriver
(ExecDriver.java:addInputPaths(840)) - Processing alias dc
> 2013-06-17 16:10:05,688 INFO  exec.ExecDriver
(ExecDriver.java:addInputPaths(858)) - Adding input file
hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
> 2013-06-17 16:10:05,689 INFO  exec.Utilities
(Utilities.java:isEmptyPath(1807)) - Content Summary not cached for
hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
> 2013-06-17 16:10:06,185 INFO  exec.ExecDriver
(ExecDriver.java:addInputPath(789)) - Changed input file to
file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1
> 2013-06-17 16:10:06,226 INFO  exec.ExecDriver
(ExecDriver.java:addInputPaths(840)) - Processing alias $INTNAME
> 2013-06-17 16:10:06,226 INFO  exec.ExecDriver
(ExecDriver.java:addInputPaths(858)) - Adding input file
hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
> 2013-06-17 16:10:06,226 INFO  exec.Utilities
(Utilities.java:isEmptyPath(1807)) - Content Summary not cached for
hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
> 2013-06-17 16:10:06,681 WARN  conf.Configuration
(Configuration.java:warnOnceIfDeprecated(808)) - session.id is deprecated.
Instead, use dfs.metrics.session-id
Initializing JVM Metrics with processName=JobTracker, sessionId> 2013-06-17 16:10:06,688 INFO  exec.ExecDriver
(ExecDriver.java:createTmpDirs(215)) - Making Temp Directory:
hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10002
(JobClient.java:copyAndConfigureFiles(704)) - Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
(CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit
creating pool for
file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1;
using filter path
file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1
(CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit
creating pool for
hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004;
using filter path
hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
(FileInputFormat.java:listStatus(196)) - Total input paths to process : 2
- Cleaning up the staging area
file:/user/vagrant2000733611/.staging/job_local_0001
(UserGroupInformation.java:doAs(1335)) - PriviledgedActionException
as:vagrant (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not
exist:
/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile
(SessionState.java:printError(403)) - Job Submission failed with exception
'java.io.FileNotFoundException(File does not exist:
/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile)'
/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:787)
org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462)
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
org.apache.ha