Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Experience of Hive local mode execution style


+
Guillaume Allain 2013-07-01, 08:01
Copy link to this message
-
Re: Experience of Hive local mode execution style
Local mode is fast. In particular older version pf hadoop take a lot of
time scheduling tasks and a delay betwen map and reduce phase.

Local mode really helps with those little delays.

On Monday, July 1, 2013, Guillaume Allain <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> Would anybody have any comments or feedback about the hive local mode
execution? It is advertised as providing a boost to performance for small
data sets. It seem to fit nicely when running unit/integration tests on
single node or virtual machine.
>
> My exact questions are the following :
>
> - How significantly diverge the local mode execution of queries compared
to distributed mode? Do the results may be different in some way?
>
> - I have had encountered error when running complex queries (with several
joins/distinct/groupbys) that seem to relate to configuration (see below).
I got no exact answers from the ML and I am kind of ready to dive into the
source code.
>
> Any idea where I should aim in order to solve that particular problem?
>
> Thanks in advance,
>
> Guillaume
>
> ________________________________
> From: Guillaume Allain
> Sent: 18 June 2013 12:14
> To: [EMAIL PROTECTED]
> Subject: FileNotFoundException when using hive local mode execution style
>
> Hi all,
>
> I plan to use  hive local in order to speed-up unit testing on (very)
small data sets. (Data is still on hdfs). I switch the local mode by
setting the following variables :
>
> SET hive.exec.mode.local.auto=true;
> SET mapred.local.dir=/user;
> SET mapred.tmp.dir=file:///tmp;
> (plus creating needed directories and permissions)
>
> Simple GROUP BY, INNER and OUTER JOIN queries work just fine (with up to
3 jobs) with nice performance improvements.
>
> Unfortunately I ran into a
FileNotFoundException:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile)
on some more complex query (4 jobs, distinct on top of several joins, see
below logs if needed).
>
> Any idea about that error? What other option I am missing to have a fully
fonctional local mode?
>
> Thanks in advance, Guillaume
>
> $ tail -50
/tmp/vagrant/vagrant_20130617171313_82baad8b-1961-4055-a52e-d8865b2cd4f8.lo
>
> 2013-06-17 16:10:05,669 INFO  exec.ExecDriver
(ExecDriver.java:execute(320)) - Using
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
> 2013-06-17 16:10:05,688 INFO  exec.ExecDriver
(ExecDriver.java:execute(342)) - adding libjars:
file:///opt/events-warehouse/build/jars/joda-time.jar,file:///opt/events-warehouse/build/jars/we7-hive-udfs.jar,file:///usr/lib/hive/lib/hive-json-serde-0.2.jar,file:///usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jar,file:///opt/events-warehouse/build/jars/guava.jar
> 2013-06-17 16:10:05,688 INFO  exec.ExecDriver
(ExecDriver.java:addInputPaths(840)) - Processing alias dc
> 2013-06-17 16:10:05,688 INFO  exec.ExecDriver
(ExecDriver.java:addInputPaths(858)) - Adding input file
hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
> 2013-06-17 16:10:05,689 INFO  exec.Utilities
(Utilities.java:isEmptyPath(1807)) - Content Summary not cached for
hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
> 2013-06-17 16:10:06,185 INFO  exec.ExecDriver
(ExecDriver.java:addInputPath(789)) - Changed input file to
file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1
> 2013-06-17 16:10:06,226 INFO  exec.ExecDriver
(ExecDriver.java:addInputPaths(840)) - Processing alias $INTNAME
> 2013-06-17 16:10:06,226 INFO  exec.ExecDriver
(ExecDriver.java:addInputPaths(858)) - Adding input file
hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
> 2013-06-17 16:10:06,226 INFO  exec.Utilities
(Utilities.java:isEmptyPath(1807)) - Content Summary not cached for
hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
> 2013-06-17 16:10:06,681 WARN  conf.Configuration
(Configuration.java:warnOnceIfDeprecated(808)) - session.id is deprecated.
Instead, use dfs.metrics.session-id
Initializing JVM Metrics with processName=JobTracker, sessionId> 2013-06-17 16:10:06,688 INFO  exec.ExecDriver
(ExecDriver.java:createTmpDirs(215)) - Making Temp Directory:
hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10002
(JobClient.java:copyAndConfigureFiles(704)) - Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
(CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit
creating pool for
file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1;
using filter path
file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1
(CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit
creating pool for
hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004;
using filter path
hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
(FileInputFormat.java:listStatus(196)) - Total input paths to process : 2
- Cleaning up the staging area
file:/user/vagrant2000733611/.staging/job_local_0001
(UserGroupInformation.java:doAs(1335)) - PriviledgedActionException
as:vagrant (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not
exist:
/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile
(SessionState.java:printError(403)) - Job Submission failed with exception
'java.io.FileNotFoundException(File does not exist:
/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile)'
/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:787)
org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462)
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
org.apache.ha
+
Guillaume Allain 2013-07-04, 10:21
+
Edward Capriolo 2013-07-04, 15:53
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB