Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - cannot find DeprecatedLzoTextInputFormat


Copy link to this message
-
Re: cannot find DeprecatedLzoTextInputFormat
Joey Echeverria 2011-10-17, 00:05
Hi Jessica,

Sorry for the delay. I don't know of a pre-built version of the LZO
libraries that has the fix. I also couldn't quite tell which source
versions might have it. The easiest thing to do would be to pull the
source from github, make any changes, and build it locally:

https://github.com/kevinweil/hadoop-lzo

-Joey

On Mon, Oct 10, 2011 at 7:54 PM, Jessica Owensby
<[EMAIL PROTECTED]> wrote:
> I understood the comments in the JIRA ticket to say that hadoop-lzo
> 0.4.8.jar from gerrit had the fix for
> HIVE-2395<https://issues.apache.org/jira/browse/HIVE-2395>.
>  I wasn't able to find a good version of 0.4.8 of already built (I found
> this, but there appears to be some issues with it:
> http://hadoop-gpl-packing.googlecode.com/svn-history/r18/trunk/src/main/resources/lib/hadoop-lzo-0.4.8.jar).
> And hadoop-lzo-0.4.13.jar (
> http://hadoop-gpl-packing.googlecode.com/svn-history/r39/trunk/hadoop/src/main/resources/lib/hadoop-lzo-0.4.13.jar)
> doesn't contain the fix.  Is there a version of the jar built with the
> HIVE-2395 fix?  I thought I would ask before I build it myself.
>
> Lastly, I didn't mention before that this issue appears in only one of our 2
> environments - both running cdh3u1.  I've done an number of comparisons
> between the environments and am still unable to find a dissimilarity that
> might be resulting in the 'No LZO codec found' error.  So, it
> would surprise me if we required the fix in one environment and did not in
> another -- but that may just show my lack of understanding about hadoop. :-)
>
> Jessica
>
> On Wed, Oct 5, 2011 at 4:27 PM, Jessica Owensby
> <[EMAIL PROTECTED]>wrote:
>
>> Great.  Thanks!  Will give that a try.
>> Jessica
>>
>>
>> On Wed, Oct 5, 2011 at 4:22 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote:
>>
>>> It sounds like you're hitting this:
>>>
>>> https://issues.apache.org/jira/browse/HIVE-2395
>>>
>>> You might need to patch your version of DeprecatedLzoLineRecordReader
>>> to ignore the .lzo.index files.
>>>
>>> -Joey
>>>
>>> On Wed, Oct 5, 2011 at 4:13 PM, Jessica Owensby
>>> <[EMAIL PROTECTED]> wrote:
>>> > Alex,
>>> > The task trackers have been restarted many times across the cluster
>>> since
>>> > this issue was first seen.
>>> >
>>> > Hmmm, I hadn't tried to explicitly add the lzo jar to my classpath in
>>> the
>>> > hive shell, but I just tried it and got the same errors.
>>> >
>>> > Do you see
>>> >
>>> > /usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar in the child classpath
>>> when
>>> >
>>> > the task is executed (use 'ps aux' on the node)?
>>> >
>>> >
>>> > While the job wasn't running, I did this and I got back the tasktracker
>>> > process:  ps aux | grep java | grep lzo.
>>> > Do I have to run this while the task is running on that node?
>>> >
>>> > Joey,
>>> > Yes, the lzo files are indexed.  They are indexed using the following
>>> > command:
>>> >
>>> > hadoop jar /usr/lib/hadoop/lib/hadoop-lzo-20110217.jar
>>> > com.hadoop.compression.lzo.LzoIndexer /user/hive/warehouse/foo/bar.lzo
>>> >
>>> > Jessica
>>> >
>>> > On Wed, Oct 5, 2011 at 3:52 PM, Joey Echeverria <[EMAIL PROTECTED]>
>>> wrote:
>>> >> Are your LZO files indexed?
>>> >>
>>> >> -Joey
>>> >>
>>> >> On Wed, Oct 5, 2011 at 3:35 PM, Jessica Owensby
>>> >> <[EMAIL PROTECTED]> wrote:
>>> >>> Hi Joey,
>>> >>> Thanks. I forgot to say that; yes, the lzocodec class is listed in
>>> >>> core-site.xml under the io.compression.codecs property:
>>> >>>
>>> >>> <property>
>>> >>>  <name>io.compression.codecs</name>
>>> >>>
>>> >
>>>  <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
>>> >>> </property>
>>> >>>
>>> >>> I also added the mapred.child.env property to mapred site:
>>> >>>
>>> >>>  <property>
>>> >>>    <name>mapred.child.env</name>
>>> >>>    <value>JAVA_LIBRARY_PATH=/usr/lib/hadoop-0.20/lib</value>
>
Joseph Echeverria
Cloudera, Inc.
443.305.9434