Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Displaying source log file names in pig logs


+
Guy Bayes 2010-10-21, 16:57
+
Romain Rigaux 2010-10-25, 16:02
Copy link to this message
-
Re: Displaying source log file names in pig logs
I'm pretty sure they are suppose to be on the Input split of the tasktracker
logs aren't they?

For some reason all the Input-Slits are null

Input-split file: null
Input-split start-offset: -1
Input-split length: -1

thanks
Guy

On Mon, Oct 25, 2010 at 9:02 AM, Romain Rigaux <[EMAIL PROTECTED]>wrote:

> Hi,thanks
>
>
> I don't think that filenames are directly available but I do something like
> this in order to get them (I did not try with Pig 0.7+ yet):
>
> Create a new loader inheriting from PigStorage and get the "location" path
> of the data. Then either:
>
>   - print it if everything hasupposeppens in the same task
>   - append it in each records
>
> Hope this helps,
>
> Romain
>
> On Thu, Oct 21, 2010 at 9:57 AM, Guy Bayes <[EMAIL PROTECTED]> wrote:
>
> > We have a job that processes several hundred files in a directory
> >
> > We generally glob the directory in a single load statement
> >
> > Sometimes the jobs chokes on a bad row in a single file
> >
> > I could have sworn that pig printed the file name of the chunks it is
> > processing in the task log but cannot see it
> >
> > Does anyone know under what conditions file names are printed, or how to
> > find the file that is causing the issues?
> >
> > Thanks
> > Guy
> > >
> >
>

--
you may be acquainted with the night
but i have seen the darkness in the day
and you must know it is a terrifying sight...
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB