|
felix gao
2010-12-23, 00:59
Jeff Hammerbacher
2010-12-23, 03:13
Harsh J
2010-12-23, 04:16
Harsh J
2010-12-23, 04:17
felix gao
2010-12-23, 17:30
|
-
Avro Python appending datafelix gao 2010-12-23, 00:59
Hi all,
I am having trouble adding more data into a file. Environment: Python 2.6.5, avro-1.3.3-py2.6 Program looks like this from avro import schema, datafile, io OUTFILE_NAME = 'sample.avro' SCHEMA_STR = """{ "type": "record", "name": "bkSampleAvro", "namespace": "bk_avro_example", "fields": [ { "name": "name" , "type": "string" }, { "name": "age" , "type": "int" }, { "name": "address", "type": "string" }, { "name": "value" , "type": "long" } ] }""" SCHEMA = schema.parse(SCHEMA_STR) def write_avro_file(): # Lets generate our data data = {} data['name'] = 'Foo' data['age'] = 19 data['address'] = '10, Bar Eggs Spam' data['value'] = 800 rec_writer = io.DatumWriter(SCHEMA) df_writer = datafile.DataFileWriter( open(OUTFILE_NAME, 'ab'), rec_writer, writers_schema = SCHEMA, codec = 'deflate' ) df_writer.append(data) df_writer.close() def read_avro_file(): rec_reader = io.DatumReader() df_reader = datafile.DataFileReader( open(OUTFILE_NAME, "rb"), rec_reader ) for record in df_reader: print record['name'], record['age'] print record['address'], record['value'] if __name__ == '__main__': # Write an AVRO file first write_avro_file() write_avro_file() # Now, read it read_avro_file() The result looks like Foo 19 10, Bar Eggs Spam 800 Traceback (most recent call last): File "/Users/felixgao/Desktop/workspace/Python/avro/avroExample1.py", line 124, in <module> read_avro_file() File "/Users/felixgao/Desktop/workspace/Python/avro/avroExample1.py", line 112, in read_avro_file for record in df_reader: File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-1.3.3-py2.6.egg/avro/datafile.py", line 318, in next datum = self.datum_reader.read(self.datum_decoder) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-1.3.3-py2.6.egg/avro/io.py", line 411, in read return self.read_data(self.writers_schema, self.readers_schema, decoder) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-1.3.3-py2.6.egg/avro/io.py", line 456, in read_data return self.read_record(writers_schema, readers_schema, decoder) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-1.3.3-py2.6.egg/avro/io.py", line 648, in read_record field_val = self.read_data(field.type, readers_field.type, decoder) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-1.3.3-py2.6.egg/avro/io.py", line 434, in read_data return decoder.read_utf8() File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-1.3.3-py2.6.egg/avro/io.py", line 210, in read_utf8 return unicode(self.read_bytes(), "utf-8") UnicodeDecodeError: 'utf8' codec can't decode bytes in position 14-15: invalid data if I remove the second write_avro_file() call then everything is fine. How to properly append more data into the file? Thanks, Felix
-
Re: Avro Python appending dataJeff Hammerbacher 2010-12-23, 03:13
Hey Felix,
See the test_append() function at http://svn.apache.org/viewvc/avro/trunk/lang/py/test/test_datafile.py?view=markup . Regards, Jeff On Wed, Dec 22, 2010 at 4:59 PM, felix gao <[EMAIL PROTECTED]> wrote: > Hi all, > > I am having trouble adding more data into a file. > > Environment: Python 2.6.5, avro-1.3.3-py2.6 > > Program looks like this > > from avro import schema, datafile, io > > OUTFILE_NAME = 'sample.avro' > > SCHEMA_STR = """{ > "type": "record", > "name": "bkSampleAvro", > "namespace": "bk_avro_example", > "fields": [ > { "name": "name" , "type": "string" }, > { "name": "age" , "type": "int" }, > { "name": "address", "type": "string" }, > { "name": "value" , "type": "long" } > ] > }""" > > SCHEMA = schema.parse(SCHEMA_STR) > def write_avro_file(): > # Lets generate our data > data = {} > data['name'] = 'Foo' > data['age'] = 19 > data['address'] = '10, Bar Eggs Spam' > data['value'] = 800 > > rec_writer = io.DatumWriter(SCHEMA) > > df_writer = datafile.DataFileWriter( > open(OUTFILE_NAME, 'ab'), > rec_writer, > writers_schema = SCHEMA, > codec = 'deflate' > ) > > df_writer.append(data) > > df_writer.close() > > def read_avro_file(): > rec_reader = io.DatumReader() > > df_reader = datafile.DataFileReader( > open(OUTFILE_NAME, "rb"), > rec_reader > ) > > for record in df_reader: > print record['name'], record['age'] > print record['address'], record['value'] > > > if __name__ == '__main__': > # Write an AVRO file first > write_avro_file() > write_avro_file() > > # Now, read it > read_avro_file() > > > The result looks like > > Foo 19 > 10, Bar Eggs Spam 800 > Traceback (most recent call last): > File "/Users/felixgao/Desktop/workspace/Python/avro/avroExample1.py", > line 124, in <module> > read_avro_file() > File "/Users/felixgao/Desktop/workspace/Python/avro/avroExample1.py", > line 112, in read_avro_file > for record in df_reader: > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-1.3.3-py2.6.egg/avro/datafile.py", > line 318, in next > datum = self.datum_reader.read(self.datum_decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-1.3.3-py2.6.egg/avro/io.py", > line 411, in read > return self.read_data(self.writers_schema, self.readers_schema, > decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-1.3.3-py2.6.egg/avro/io.py", > line 456, in read_data > return self.read_record(writers_schema, readers_schema, decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-1.3.3-py2.6.egg/avro/io.py", > line 648, in read_record > field_val = self.read_data(field.type, readers_field.type, decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-1.3.3-py2.6.egg/avro/io.py", > line 434, in read_data > return decoder.read_utf8() > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-1.3.3-py2.6.egg/avro/io.py", > line 210, in read_utf8 > return unicode(self.read_bytes(), "utf-8") > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 14-15: > invalid data > > > > if I remove the second write_avro_file() call then everything is fine. How > to properly append more data into the file? > > Thanks, > > Felix >
-
Re: Avro Python appending dataHarsh J 2010-12-23, 04:16
Hi,
On Thu, Dec 23, 2010 at 6:29 AM, felix gao <[EMAIL PROTECTED]> wrote: > Hi all, > > I am having trouble adding more data into a file. > > Environment: Python 2.6.5, avro-1.3.3-py2.6 > > Program looks like this I see you've read my blog post on Avro+Python :P http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/ > if I remove the second write_avro_file() call then everything is fine. How > to properly append more data into the file? To append to an existing datafile, do not initialize the writer object with a writers_schema again. Just create it using: df_writer = datafile.DataFileWriter( open(OUTFILE_NAME, 'wb'), io.DatumWriter(), ) -- Harsh J www.harshj.com
-
Re: Avro Python appending dataHarsh J 2010-12-23, 04:17
Sorry, minor error, not 'wb', but 'ab+'
> > df_writer = datafile.DataFileWriter( > open(OUTFILE_NAME, 'ab+'), > io.DatumWriter(), > ) -- Harsh J www.harshj.com
-
Re: Avro Python appending datafelix gao 2010-12-23, 17:30
thanks guys, I will test it out.
On Wed, Dec 22, 2010 at 8:17 PM, Harsh J <[EMAIL PROTECTED]> wrote: > Sorry, minor error, not 'wb', but 'ab+' > > > > df_writer = datafile.DataFileWriter( > > open(OUTFILE_NAME, 'ab+'), > > io.DatumWriter(), > > ) > > -- > Harsh J > www.harshj.com > |