Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Re: Sync Marker Issue while reading AVRO files writen with FLUME with PIG


Copy link to this message
-
Re: Sync Marker Issue while reading AVRO files writen with FLUME with PIG
I have not seen this issue before with 100 TB of Avro files, but am not
using Flume to write them.  We have moved on to Avro 1.6.x but were on the
1.5.x line for quite some time.  Perhaps while writing there was an
exception of some sort that was not handled correctly in Avro or Flume.

Looking at the DataFileWriter code, I can see how a file could get
truncated without a sync marker if the writing process crashes, but not
how it could successfully write two blocks in a row without a sync between.

You should be able to modify the file reader to recover and re-write the
data if it is only a missing sync marker, or skip over the block if it is
corrupt.

On 4/3/12 1:28 AM, "Markus Resch" <[EMAIL PROTECTED]> wrote:

>Hey everyone,
>
>we're facing a problem while reading AVRO files written with FLUME using
>the AVRO Java API 1.5.4 into a HADOOP cluster. The Avro Data Store
>complains about missing sync marker. Investigating the problem shows us,
>that's perfectly right. The sync marker is missing. Thus we have a block
>of the double size.
>
>Our software packets:
> rpm -qa | grep hadoop
>hadoop-0.20-namenode-0.20.2+923.142-1
>hadoop-0.20-0.20.2+923.142-1
>hadoop-0.20-native-0.20.2+923.142-1
>hadoop-hive-0.7.1+42.27-2
>hadoop-pig-0.8.1+28.18-1
>
>This is pretty much all a basic cloudera
>CDH3 Update 2 Packaging installation with a patched PIG version which is
>CDH3 Update 3.
>
>Did anyone had a similar issue? Does this ring a bell?
>
>Thanks
>
>Markus
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB