Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Creating Indexes


Copy link to this message
-
RE: Creating Indexes
Hi Shreepadma,

I have looked and I can't find anything that looks like a log with any more
information in it. The vast bulk of logs that I find seem to concern the
Map/Reduce which we agree has succeeded.
The only thing that looks a little odd is in the syslogs for the reducers.
All 138 of them seem to feature a line like this:

2012-11-07 11:10:17,960 ERROR org.apache.hadoop.hive.ql.exec.FileSinkOperator: StatsPublishing error: cannot connect to database

I don't know if this is a serious problem or not or whether it might have
bearing on the subsequent Hive error?
I have attached the last reducer syslog so that you can see this problem in context.
Also I managed to find a hive.log, attached.
It seems to have lots of warnings and "errors" but I don't know if any of them
are relevant.

I'm being pressed to make some progress on this problem, but
I don't know what to do next. Is there any other files that I can
provide that might help? Do I need to add some instrumentation?
I guess that I could re-build various Hive class files to get more information.
But I really don't know Hive at all and so I suspect that it'll take me
ages flailing around before I find anything helpful.
Any suggestions for a way forward gratefully received,
I'm happy to make any changes that might help.

Regards,

Z

From: Shreepadma Venugopalan [mailto:[EMAIL PROTECTED]]
Sent: 03 November 2012 00:06
To: [EMAIL PROTECTED]
Subject: Re: Creating Indexes

Hi Peter,

While it looks like the map-red task may have succeeded it looks like the alter index actually failed. You should look into the execution log to see what the exception is. Without knowing why the DDLtask failed its hard to pinpoint the problem.

As for the original problem with the jar as Dean pointed out for some odd reason the jar was not on the classpath prior to the add jar.

Thanks,
Shreepadma
On Fri, Nov 2, 2012 at 4:59 PM, Peter Marron <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi Dean,

At this stage I'm really not worried about this being a hack.
I just want to get it to work, and I'm grateful for all your help.
I did as you suggested and now, as far as I can see, the Map/Reduce
has succeeded. When I look in the log for the last reduce I no longer
find an error. However this is the output from the hive command
session:

MapReduce Total cumulative CPU time: 0 days 1 hours 14 minutes 51 seconds 360 msec
Ended Job = job_201211021743_0001
Loading data to table default.default__score_bigindex__
Deleted hdfs://localhost/data/warehouse/default__score_bigindex__
Invalid alter operation: Unable to alter index.
Table default.default__score_bigindex__ stats: [num_partitions: 0, num_files: 138, num_rows: 0, total_size: 446609024, raw_data_size: 0]
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
MapReduce Jobs Launched:
Job 0: Map: 511  Reduce: 138   Accumulative CPU: 4491.36 sec   HDFS Read: 137123460712 HDFS Write: 446609024 SUCESS
Total MapReduce CPU Time Spent: 0 days 1 hours 14 minutes 51 seconds 360 msec
hive>

I find this very confusing. We have the bit where it says "Job 0:.... SUCCESS"
and this seems to fit with the fact that I can't find errors in the Map/Reduce.
On the other hand we have the bit where it says: "Invalid alter operation: Unable to alter index."
So has it successfully created the index  or not? And if not, then what do I do next?
Is there somewhere else where it records Hive errors as opposed to Map/Reduce errors?

Regards,

Peter Marron
From: Dean Wampler [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: 02 November 2012 14:03

To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: Creating Indexes

Oh, I saw this line in your Hive output and just assumed you were running in a cluster:

Hadoop job information for Stage-1: number of mappers: 511; number of reducers: 138

I haven't tried running a job that big in pseudodistributed mode either, but that's beside the point.

So it seems to be an issue with indexing, but it still begs the question why derby isn't on the classpath for the task. I would try using the ADD JAR command, which copies the jar around the "cluster" and puts it on the classpath. It's what you would use with UDFs, for example:

ADD JAR /path/to/derby.jar
ALTER INDEX ...;

It's a huge hack, but it just might work.
dean

On Fri, Nov 2, 2012 at 3:44 AM, Peter Marron <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi Dean,

I'm running everything on a single physical machine in pseudo-distributed mode.

Well it certainly looks like the reducer is looking for a derby.jar, although I must
confess I don't really understand why it would be doing that.
In an effort to fix that I copied the derby.jar (derby-10.4.2.0.jar) into the
Hadoop directory, where I assume that the reducer would be able to find it.
However I get exactly the same problem as before.
Is there some particular place that I should put the derby.jar to make this
problem go away? Is there anything else that I can try?

Peter Marron

From: Dean Wampler [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: 01 November 2012 13:02

To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: Creating Indexes

It looks like you're using Derby with a real cluster, not just a single machine in local or pseudo-distributed mode. I haven't tried this myself, but the derby jar is probably not on the machine that ran the reducer task that failed.

dean
On Thu, Nov 1, 2012 at 4:31 AM, Peter Marron <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi Shreepadma,

I agree that the error looks odd. However I can't believe that I would have
got this far with Hive if there was no derby jar. Nevertheless I checked.
Here is a directory listing of the Hive install:
 [snip]
Dean Wampler, Ph.D.
thinkbiganalytics.com<http://thi