Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Way of determining the source of data


Copy link to this message
-
Re: Way of determining the source of data
Check https://cwiki.apache.org/confluence/display/PIG/FAQ#FAQ-Q%3AIloaddatafromadirectorywhichcontainsdifferentfile.HowdoIfindoutwherethedatacomesfrom%3F

On Thu, Feb 2, 2012 at 5:11 PM, Ranjan Bagchi <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I've a bunch of [for example] apache logfiles that I'm searching through.  I can process them with:
>
> logs = load 's3://bucket/directory/*' USING LogLoader as (remoteAddr, remoteLogname, user, time :chararray, method, uri :chararray, proto, status, bytes, referer, userAgent);
>
> Is there any way of getting the name of the file from which logs was pulled added to the relation?
>
> Thanks,
>
> Ranjan
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB