Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Displaying source log file names in pig logs


Copy link to this message
-
Re: Displaying source log file names in pig logs
Guy Bayes 2010-10-25, 16:09
I'm pretty sure they are suppose to be on the Input split of the tasktracker
logs aren't they?

For some reason all the Input-Slits are null

Input-split file: null
Input-split start-offset: -1
Input-split length: -1

thanks
Guy

On Mon, Oct 25, 2010 at 9:02 AM, Romain Rigaux <[EMAIL PROTECTED]>wrote:

> Hi,thanks
>
>
> I don't think that filenames are directly available but I do something like
> this in order to get them (I did not try with Pig 0.7+ yet):
>
> Create a new loader inheriting from PigStorage and get the "location" path
> of the data. Then either:
>
>   - print it if everything hasupposeppens in the same task
>   - append it in each records
>
> Hope this helps,
>
> Romain
>
> On Thu, Oct 21, 2010 at 9:57 AM, Guy Bayes <[EMAIL PROTECTED]> wrote:
>
> > We have a job that processes several hundred files in a directory
> >
> > We generally glob the directory in a single load statement
> >
> > Sometimes the jobs chokes on a bad row in a single file
> >
> > I could have sworn that pig printed the file name of the chunks it is
> > processing in the task log but cannot see it
> >
> > Does anyone know under what conditions file names are printed, or how to
> > find the file that is causing the issues?
> >
> > Thanks
> > Guy
> > >
> >
>

--
you may be acquainted with the night
but i have seen the darkness in the day
and you must know it is a terrifying sight...