Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Re: Loader for small files


+
Something Something 2013-02-12, 19:32
+
Something Something 2013-02-11, 18:22
Copy link to this message
-
Re: Loader for small files
Sorry.. Moving 'hbase' mailing list to BCC 'cause this is not related to
HBase.  Adding 'hadoop' user group.

On Mon, Feb 11, 2013 at 10:22 AM, Something Something <
[EMAIL PROTECTED]> wrote:

> Hello,
>
> We are running into performance issues with Pig/Hadoop because our input
> files are small.  Everything goes to only 1 Mapper.  To get around this, we
> are trying to use our own Loader like this:
>
> 1)  Extend PigStorage:
>
> public class SmallFileStorage extends PigStorage {
>
>     public SmallFileStorage(String delimiter) {
>         super(delimiter);
>     }
>
>     @Override
>     public InputFormat getInputFormat() {
>         return new NLineInputFormat();
>     }
> }
>
>
>
> 2)  Add command line argument to the Pig command as follows:
>
> -Dmapreduce.input.lineinputformat.linespermap=500000
>
>
>
> 3)  Use SmallFileStorage in the Pig script as follows:
>
> USING com.xxx.yyy.SmallFileStorage ('\t')
>
>
> But this doesn't seem to work.  We still see that everything is going to
> one mapper.  Before we spend any more time on this, I am wondering if this
> is a good approach – OR – if there's a better approach?  Please let me
> know.  Thanks.
>
>
>
+
David LaBarbera 2013-02-11, 18:29
+
Something Something 2013-02-11, 19:10
+
David LaBarbera 2013-02-11, 20:38
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB