Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Errors reading lzo-compressed files from Hadoop


Copy link to this message
-
Re: Errors reading lzo-compressed files from Hadoop
Both Kevin's and Todd's branches now pass my tests. Thanks again Todd.

-D

On Thu, Apr 8, 2010 at 10:46 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> OK, fixed, unit tests passing again. If anyone sees any more problems let
> one of us know!
>
> Thanks
> -Todd
>
> On Thu, Apr 8, 2010 at 10:39 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
>
>> Doh, a couple more silly bugs in there. Don't use that version quite yet -
>> I'll put up a better patch later today. (Thanks to Kevin and Ted Yu for
>> pointing out the additional problems)
>>
>> -Todd
>>
>>
>> On Wed, Apr 7, 2010 at 5:24 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
>>
>>> For Dmitriy and anyone else who has seen this error, I just committed a
>>> fix to my github repository:
>>>
>>>
>>> http://github.com/toddlipcon/hadoop-lzo/commit/f3bc3f8d003bb8e24f254b25bca2053f731cdd58
>>>
>>> The problem turned out to be an assumption that InputStream.read() would
>>> return all the bytes that were asked for. This turns out to almost always be
>>> true on local filesystems, but on HDFS it's not true if the read crosses a
>>> block boundary. So, every couple of TB of lzo compressed data one might see
>>> this error.
>>>
>>> Big thanks to Alex Roetter who was able to provide a file that exhibited
>>> the bug!
>>>
>>> Thanks
>>> -Todd
>>>
>>>
>>> On Tue, Apr 6, 2010 at 10:35 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hi Alex,
>>>> Unfortunately I wasn't able to reproduce, and the data Dmitriy is
>>>> working with is sensitive.
>>>> Do you have some data you could upload (or send me off list) that
>>>> exhibits the issue?
>>>> -Todd
>>>>
>>>> On Tue, Apr 6, 2010 at 9:50 AM, Alex Roetter <[EMAIL PROTECTED]>
>>>> wrote:
>>>> >
>>>> > Todd Lipcon <todd@...> writes:
>>>> >
>>>> > >
>>>> > > Hey Dmitriy,
>>>> > >
>>>> > > This is very interesting (and worrisome in a way!) I'll try to take a
>>>> look
>>>> > > this afternoon.
>>>> > >
>>>> > > -Todd
>>>> > >
>>>> >
>>>> > Hi Todd,
>>>> >
>>>> > I wanted to see if you made any progress on this front. I'm seeing a
>>>> very
>>>> > similar error, trying to run a MR (Hadoop 0.20.1) over a bunch of
>>>> > LZOP compressed / indexed files (using Kevin Weil's package), and I
>>>> have one
>>>> > map task that always fails in what looks like the same place as
>>>> described in
>>>> > the previous post. I haven't yet done the experimentation mentioned
>>>> above
>>>> > (isolating the input file corresponding to the failed map task,
>>>> decompressing
>>>> > it / recompressing it, testing it out operating directly on local disk
>>>> > instead of HDFS, etc).
>>>> >
>>>> > However, since I am crashing in exactly the same place it seems likely
>>>> this
>>>> > is related, and thought I'd check on your work in the meantime.
>>>> >
>>>> > FYI, my stack track is below:
>>>> >
>>>> > 2010-04-05 18:15:16,895 FATAL org.apache.hadoop.mapred.TaskTracker:
>>>> Error
>>>> > running child : java.lang.InternalError: lzo1x_decompress_safe
>>>> returned:
>>>> >        at
>>>> com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect
>>>> > (Native Method)
>>>> >        at com.hadoop.compression.lzo.LzoDecompressor.decompress
>>>> > (LzoDecompressor.java:303)
>>>> >        at
>>>> > com.hadoop.compression.lzo.LzopDecompressor.decompress
>>>> > (LzopDecompressor.java:104)
>>>> >        at com.hadoop.compression.lzo.LzopInputStream.decompress
>>>> > (LzopInputStream.java:223)
>>>> >        at
>>>> > org.apache.hadoop.io.compress.DecompressorStream.read
>>>> > (DecompressorStream.java:74)
>>>> >        at java.io.InputStream.read(InputStream.java:85)
>>>> >        at
>>>> org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
>>>> >        at
>>>> org.apache.hadoop.util.LineReader.readLine(LineReader.java:187)
>>>> >        at
>>>> > com.hadoop.mapreduce.LzoLineRecordReader.nextKeyValue
>>>> > (LzoLineRecordReader.java:126)
>>>> >        at
>>>> > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue
>>>> > (MapTask.java:423)
>>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB