Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Reading multiple files of a directory using a Single LOAD Command in PIG


Copy link to this message
-
Re: Reading multiple files of a directory using a Single LOAD Command in PIG
Yes, you can do that - it will still apply the filter to the globbed results.

On Wed, Jun 12, 2013 at 3:45 AM, Mix Nin <[EMAIL PROTECTED]> wrote:
> Hi,
>
> My mistake, I gave backward slashes and so was getting error. I gave
> forward slashes and it is working fine.
>
> Good to know that LOAD ignores filenames that begin with "_" or a period
> ".". So , in that case can I directly give LOAD /Output/* instead of   LOAD
>  /Output/part-m*?
>
> Thanks
>
>
>
>
> On Tue, Jun 11, 2013 at 2:32 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:
>
>> What is the error?
>>
>> The LoadFunc should be ignoring any filenames that begin with "_" or a
>> period "."
>> If you are trying to skip the _SUCCESS file, the loader you are using
>> (PigStorage) already handles that.
>>
>> Also, can you double check your path is not "/Output/part-m* as opposed to
>> backward slashes?
>>
>>
>> On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <[EMAIL PROTECTED]> wrote:
>>
>> > I have a directory "Output2. It has file names as below
>> >
>> > -----------------
>> > _SUCCESS
>> > part-m-00000
>> > part-m-00001
>> > part-m-00002
>> > part-m-00003
>> > .
>> > .
>> > .
>> > .
>> > part-m-00100
>> > -----------------
>> >
>> > The above files are produced by PIG output STORE command .
>> >
>> > I want to read the files starting with "part-m-" using PIG command
>> >
>> > When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
>> > It does not work and it throws error.
>> >
>> > How do I read these files in a single LOAD statement?
>> >
>> > Thanks
>> >
>> >
>>

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB