Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - OrcFile writing failing with multiple threads


Copy link to this message
-
Re: OrcFile writing failing with multiple threads
Owen O'Malley 2013-05-24, 20:15
Currently, ORC writers, like the Java collections API don't lock
themselves. You should synchronize on the writer before adding a row. I'm
open to making the writers synchronized.

-- Owen
On Fri, May 24, 2013 at 11:39 AM, Andrew Psaltis <
[EMAIL PROTECTED]> wrote:

>  All,
> I have a test application that is attempting to add rows to an OrcFile
> from multiple threads, however, every time I do I get exceptions with stack
> traces like the following:
>
>  java.lang.IndexOutOfBoundsException: Index 4 is outside of 0..5
> at
> org.apache.hadoop.hive.ql.io.orc.DynamicIntArray.get(DynamicIntArray.java:73)
> at
> org.apache.hadoop.hive.ql.io.orc.StringRedBlackTree.compareValue(StringRedBlackTree.java:55)
> at org.apache.hadoop.hive.ql.io.orc.RedBlackTree.add(RedBlackTree.java:192)
> at org.apache.hadoop.hive.ql.io.orc.RedBlackTree.add(RedBlackTree.java:199)
> at org.apache.hadoop.hive.ql.io.orc.RedBlackTree.add(RedBlackTree.java:300)
> at
> org.apache.hadoop.hive.ql.io.orc.StringRedBlackTree.add(StringRedBlackTree.java:45)
> at
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.write(WriterImpl.java:723)
> at
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$MapTreeWriter.write(WriterImpl.java:1093)
> at
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:996)
> at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:1450)
> at OrcFileTester$BigRowWriter.run(OrcFileTester.java:129)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
>
>
>  Below is the source code for my sample app that is heavily based on the
> TestOrcFile test case using BigRow. Is there something I am doing wrong
> here, or is this a legitimate bug in the Orc writing?
>
>  Thanks in advance,
> Andrew
>
>
>  ------------------------- Java app code follows
> ---------------------------------
>  import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.hive.ql.io.orc.CompressionKind;
> import org.apache.hadoop.hive.ql.io.orc.OrcFile;
> import org.apache.hadoop.hive.ql.io.orc.Writer;
> import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
> import
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.Text;
>
>  import java.io.File;
> import java.io.IOException;
> import java.util.HashMap;
> import java.util.Map;
> import java.util.concurrent.ExecutorService;
> import java.util.concurrent.Executors;
> import java.util.concurrent.LinkedBlockingQueue;
>
>  public class OrcFileTester {
>
>      private Writer writer;
>     private LinkedBlockingQueue<BigRow> bigRowQueue = new
> LinkedBlockingQueue<BigRow>();
>     public OrcFileTester(){
>
>        try{
>         Path workDir = new Path(System.getProperty("test.tmp.dir",
>                 "target" + File.separator + "test" + File.separator +
> "tmp"));
>
>          Configuration conf;
>         FileSystem fs;
>         Path testFilePath;
>
>          conf = new Configuration();
>         fs = FileSystem.getLocal(conf);
>         testFilePath = new Path(workDir, "TestOrcFile.OrcFileTester.orc");
>         fs.delete(testFilePath, false);
>
>
>          ObjectInspector inspector > ObjectInspectorFactory.getReflectionObjectInspector
>                 (BigRow.class,
> ObjectInspectorFactory.ObjectInspectorOptions.JAVA);
>         writer = OrcFile.createWriter(fs, testFilePath, conf, inspector,
>                 100000, CompressionKind.ZLIB, 10000, 10000);
>
>          final ExecutorService bigRowWorkerPool > Executors.newFixedThreadPool(10);