Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Loader for small files


Copy link to this message
-
Re: Loader for small files
Something Something 2013-02-11, 18:24
Sorry.. Moving 'hbase' mailing list to BCC 'cause this is not related to
HBase.  Adding 'hadoop' user group.

On Mon, Feb 11, 2013 at 10:22 AM, Something Something <
[EMAIL PROTECTED]> wrote:

> Hello,
>
> We are running into performance issues with Pig/Hadoop because our input
> files are small.  Everything goes to only 1 Mapper.  To get around this, we
> are trying to use our own Loader like this:
>
> 1)  Extend PigStorage:
>
> public class SmallFileStorage extends PigStorage {
>
>     public SmallFileStorage(String delimiter) {
>         super(delimiter);
>     }
>
>     @Override
>     public InputFormat getInputFormat() {
>         return new NLineInputFormat();
>     }
> }
>
>
>
> 2)  Add command line argument to the Pig command as follows:
>
> -Dmapreduce.input.lineinputformat.linespermap=500000
>
>
>
> 3)  Use SmallFileStorage in the Pig script as follows:
>
> USING com.xxx.yyy.SmallFileStorage ('\t')
>
>
> But this doesn't seem to work.  We still see that everything is going to
> one mapper.  Before we spend any more time on this, I am wondering if this
> is a good approach – OR – if there's a better approach?  Please let me
> know.  Thanks.
>
>
>