Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Experience of Hive local mode execution style


Copy link to this message
-
Experience of Hive local mode execution style
Hi all,

Would anybody have any comments or feedback about the hive local mode execution? It is advertised as providing a boost to performance for small data sets. It seem to fit nicely when running unit/integration tests on single node or virtual machine.

My exact questions are the following :

- How significantly diverge the local mode execution of queries compared to distributed mode? Do the results may be different in some way?

- I have had encountered error when running complex queries (with several joins/distinct/groupbys) that seem to relate to configuration (see below). I got no exact answers from the ML and I am kind of ready to dive into the source code.

Any idea where I should aim in order to solve that particular problem?

Thanks in advance,

Guillaume

________________________________
From: Guillaume Allain
Sent: 18 June 2013 12:14
To: [EMAIL PROTECTED]
Subject: FileNotFoundException when using hive local mode execution style

Hi all,

I plan to use  hive local in order to speed-up unit testing on (very) small data sets. (Data is still on hdfs). I switch the local mode by setting the following variables :

SET hive.exec.mode.local.auto=true;
SET mapred.local.dir=/user;
SET mapred.tmp.dir=file:///tmp;
(plus creating needed directories and permissions)

Simple GROUP BY, INNER and OUTER JOIN queries work just fine (with up to 3 jobs) with nice performance improvements.

Unfortunately I ran into a  FileNotFoundException:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile) on some more complex query (4 jobs, distinct on top of several joins, see below logs if needed).

Any idea about that error? What other option I am missing to have a fully fonctional local mode?

Thanks in advance, Guillaume

$ tail -50 /tmp/vagrant/vagrant_20130617171313_82baad8b-1961-4055-a52e-d8865b2cd4f8.lo

2013-06-17 16:10:05,669 INFO  exec.ExecDriver (ExecDriver.java:execute(320)) - Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
2013-06-17 16:10:05,688 INFO  exec.ExecDriver (ExecDriver.java:execute(342)) - adding libjars: file:///opt/events-warehouse/build/jars/joda-time.jar,file:///opt/events-warehouse/build/jars/we7-hive-udfs.jar,file:///usr/lib/hive/lib/hive-json-serde-0.2.jar,file:///usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jar,file:///opt/events-warehouse/build/jars/guava.jar
2013-06-17 16:10:05,688 INFO  exec.ExecDriver (ExecDriver.java:addInputPaths(840)) - Processing alias dc
2013-06-17 16:10:05,688 INFO  exec.ExecDriver (ExecDriver.java:addInputPaths(858)) - Adding input file hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
2013-06-17 16:10:05,689 INFO  exec.Utilities (Utilities.java:isEmptyPath(1807)) - Content Summary not cached for hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
2013-06-17 16:10:06,185 INFO  exec.ExecDriver (ExecDriver.java:addInputPath(789)) - Changed input file to file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1
2013-06-17 16:10:06,226 INFO  exec.ExecDriver (ExecDriver.java:addInputPaths(840)) - Processing alias $INTNAME
2013-06-17 16:10:06,226 INFO  exec.ExecDriver (ExecDriver.java:addInputPaths(858)) - Adding input file hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
2013-06-17 16:10:06,226 INFO  exec.Utilities (Utilities.java:isEmptyPath(1807)) - Content Summary not cached for hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
2013-06-17 16:10:06,681 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(808)) - session.id<http://session.id> is deprecated. Instead, use dfs.metrics.session-id
2013-06-17 16:10:06,682 INFO  jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId2013-06-17 16:10:06,688 INFO  exec.ExecDriver (ExecDriver.java:createTmpDirs(215)) - Making Temp Directory: hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10002
2013-06-17 16:10:06,706 WARN  mapred.JobClient (JobClient.java:copyAndConfigureFiles(704)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2013-06-17 16:10:06,942 INFO  io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit creating pool for file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1; using filter path file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1
2013-06-17 16:10:06,943 INFO  io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit creating pool for hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004; using filter path hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
2013-06-17 16:10:06,951 INFO  mapred.FileInputFormat (FileInputFormat.java:listStatus(196)) - Total input paths to process : 2
2013-06-17 16:10:06,953 INFO  mapred.JobClient (JobClient.java:run(982)) - Cleaning up the staging area file:/user/vagrant2000733611/.staging/job_local_0001
2013-06-17 16:10:06,953 ERROR security.UserGroupInformation (UserGroupInformation.java:doAs(1335)) - PriviledgedActionException as:vagrant (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile
2013-06-17 16:10:06,956 ERROR exec.ExecDriver (SessionState.java:printError(403)) - Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile)'
java.io.FileNotFoundException: File does not exist: /tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:787)
    at org.apache.had
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB