Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> OrcFile writing failing with multiple threads


+
Andrew Psaltis 2013-05-24, 18:39
+
Owen OMalley 2013-05-24, 20:15
+
Andrew Psaltis 2013-05-24, 20:28
Copy link to this message
-
Re: OrcFile writing failing with multiple threads
Sorry for dropping this thread for so long. Are you using the orc.Writer
interface directly or going through the OrcOutputFormat? I forgot that I
had updated the Writer to synchronize things. It looks like all of the
methods in Writer are synchronized. I haven't had a chance to investigate
this further yet.

-- Owen
On Fri, May 24, 2013 at 1:28 PM, Andrew Psaltis <
[EMAIL PROTECTED]> wrote:

>  Here is a snippet from the file header comment the WriterImpl for ORC:
>
>  /**
>  …………
>  * This class is synchronized so that multi-threaded access is ok. In
>  * particular, because the MemoryManager is shared between writers, this
> class
>  * assumes that checkMemory may be called from a separate thread.
>  */
>
>  And then the addRow looks like this:
>
>   public void addRow(Object row) throws IOException {
>     synchronized (this) {
>       treeWriter.write(row);
>       rowsInStripe += 1;
>       if (buildIndex) {
>         rowsInIndex += 1;
>
>          if (rowsInIndex >= rowIndexStride) {
>           createRowIndexEntry();
>         }
>       }
>     }
>     memoryManager.addedRow();
>   }
>
>  Am I missing something here about the synchronized(this) ?  Perhaps I am
> looking in the wrong place.
>
>  Thanks,
> agp
>
>
>   From: Owen O'Malley <[EMAIL PROTECTED]>
> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Date: Friday, May 24, 2013 2:15 PM
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Subject: Re: OrcFile writing failing with multiple threads
>
>   Currently, ORC writers, like the Java collections API don't lock
> themselves. You should synchronize on the writer before adding a row. I'm
> open to making the writers synchronized.
>
>  -- Owen
>
>
> On Fri, May 24, 2013 at 11:39 AM, Andrew Psaltis <
> [EMAIL PROTECTED]> wrote:
>
>>  All,
>> I have a test application that is attempting to add rows to an OrcFile
>> from multiple threads, however, every time I do I get exceptions with stack
>> traces like the following:
>>
>>  java.lang.IndexOutOfBoundsException: Index 4 is outside of 0..5
>> at
>> org.apache.hadoop.hive.ql.io.orc.DynamicIntArray.get(DynamicIntArray.java:73)
>> at
>> org.apache.hadoop.hive.ql.io.orc.StringRedBlackTree.compareValue(StringRedBlackTree.java:55)
>> at
>> org.apache.hadoop.hive.ql.io.orc.RedBlackTree.add(RedBlackTree.java:192)
>> at
>> org.apache.hadoop.hive.ql.io.orc.RedBlackTree.add(RedBlackTree.java:199)
>> at
>> org.apache.hadoop.hive.ql.io.orc.RedBlackTree.add(RedBlackTree.java:300)
>> at
>> org.apache.hadoop.hive.ql.io.orc.StringRedBlackTree.add(StringRedBlackTree.java:45)
>> at
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.write(WriterImpl.java:723)
>> at
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$MapTreeWriter.write(WriterImpl.java:1093)
>> at
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:996)
>> at
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:1450)
>> at OrcFileTester$BigRowWriter.run(OrcFileTester.java:129)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:722)
>>
>>
>>  Below is the source code for my sample app that is heavily based on the
>> TestOrcFile test case using BigRow. Is there something I am doing wrong
>> here, or is this a legitimate bug in the Orc writing?
>>
>>  Thanks in advance,
>> Andrew
>>
>>
>>  ------------------------- Java app code follows
>> ---------------------------------
>>  import org.apache.hadoop.conf.Configuration;
>> import org.apache.hadoop.fs.FileSystem;
>> import org.apache.hadoop.fs.Path;
>> import org.apache.hadoop.hive.ql.io.orc.CompressionKind;
>
+
Andrew Psaltis 2013-06-11, 17:10
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB