Ravindra 2011-09-25, 14:10
-Re: Avro C issue with multi-threading
Douglas Creager 2011-09-26, 20:56
> I'm using avro C library to serialize and de-serialize data. I'm seeing
> issues(seg-faults) when I use it with multi-threading, the same code works
> fine when I run it in single threaded mode. To give some details about my
> code, on init I read the schema from a file and create a global
> avro_schema_t object. Multiple threads then use this global variable(schema)
> to serialize and de-serialize data. The schema itself is never modified
> during run-time in my code. From whatever I understood by going through avro
> code, I don't think the schema is modified by avro code either during
> serializing/de-serializing. If this is in-fact the case, the schema is
> essentially a read-only global and should be fine with multiple threads
> accessing it. I haven't specifically found any documentation that claims
> that avro C is thread safe, It would be really helpful if someone who as
> used avro C in a multi-threaded environment could share their experience.
> And also, let me know if what I am trying is infact possible.
Which library version are you using? Anything in the 1.5 branch or earlier doesn't make any guarantees about thread safety. Awhile back I checked in a patch for AVRO-746  that made the various incref and decref functions thread-safe, but this was only applied to the Subversion HEAD, and not back-ported to 1.5. You're right that the contents of the schema objects aren't modified during serialization or deserialization, but some of the helper objects that are created do update the reference counts of any schemas pointers that they hold. Without the AVRO-746 patch, you could easily have race conditions that would cause the schema objects to be freed while there were still references to them.
Can you try the latest Subversion HEAD and see if that fixes the segfaults? Also note that even with HEAD, it's only the incref and decref functions that are thread-safe. If you're doing any updates or modifications to an Avro object, it should only be used within a single thread. And if any object is used in multiple threads, you can only read from it.
An alternative, if you can't use HEAD, is to create a separate copy of the schema for each thread, using avro_schema_copy.