Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Re: Loader for small files


+
Something Something 2013-02-12, 19:32
Copy link to this message
-
Loader for small files
Hello,

We are running into performance issues with Pig/Hadoop because our input
files are small.  Everything goes to only 1 Mapper.  To get around this, we
are trying to use our own Loader like this:

1)  Extend PigStorage:

public class SmallFileStorage extends PigStorage {

    public SmallFileStorage(String delimiter) {
        super(delimiter);
    }

    @Override
    public InputFormat getInputFormat() {
        return new NLineInputFormat();
    }
}

2)  Add command line argument to the Pig command as follows:

-Dmapreduce.input.lineinputformat.linespermap=500000

3)  Use SmallFileStorage in the Pig script as follows:

USING com.xxx.yyy.SmallFileStorage ('\t')
But this doesn't seem to work.  We still see that everything is going to
one mapper.  Before we spend any more time on this, I am wondering if this
is a good approach – OR – if there's a better approach?  Please let me
know.  Thanks.
+
Something Something 2013-02-11, 18:24
+
David LaBarbera 2013-02-11, 18:29
+
Something Something 2013-02-11, 19:10
+
David LaBarbera 2013-02-11, 20:38
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB