Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Reading multiple files of a directory using a Single LOAD Command in PIG


Copy link to this message
-
Re: Reading multiple files of a directory using a Single LOAD Command in PIG
Yes, you can do that - it will still apply the filter to the globbed results.

On Wed, Jun 12, 2013 at 3:45 AM, Mix Nin <[EMAIL PROTECTED]> wrote:
> Hi,
>
> My mistake, I gave backward slashes and so was getting error. I gave
> forward slashes and it is working fine.
>
> Good to know that LOAD ignores filenames that begin with "_" or a period
> ".". So , in that case can I directly give LOAD /Output/* instead of   LOAD
>  /Output/part-m*?
>
> Thanks
>
>
>
>
> On Tue, Jun 11, 2013 at 2:32 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:
>
>> What is the error?
>>
>> The LoadFunc should be ignoring any filenames that begin with "_" or a
>> period "."
>> If you are trying to skip the _SUCCESS file, the loader you are using
>> (PigStorage) already handles that.
>>
>> Also, can you double check your path is not "/Output/part-m* as opposed to
>> backward slashes?
>>
>>
>> On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <[EMAIL PROTECTED]> wrote:
>>
>> > I have a directory "Output2. It has file names as below
>> >
>> > -----------------
>> > _SUCCESS
>> > part-m-00000
>> > part-m-00001
>> > part-m-00002
>> > part-m-00003
>> > .
>> > .
>> > .
>> > .
>> > part-m-00100
>> > -----------------
>> >
>> > The above files are produced by PIG output STORE command .
>> >
>> > I want to read the files starting with "part-m-" using PIG command
>> >
>> > When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
>> > It does not work and it throws error.
>> >
>> > How do I read these files in a single LOAD statement?
>> >
>> > Thanks
>> >
>> >
>>

--
Harsh J