Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Mixed input formats in LOAD path


+
Johannes Schwenk 2012-06-15, 12:13
+
Ruslan Al-Fakikh 2012-06-15, 12:37
Copy link to this message
-
Re: Mixed input formats in LOAD path
Hi Ruslan,

thanks for you answer!

I have only the input path, but do not know which file format the
different files in that path possess. All files that are in the path
belong to one relation however, so i want to load them at once. Though a
union of separately loaded files would be ok too, if that is possible to
achieve. Important is, that the LOAD automatically takes care of the
different formats.

To illustrate further consider the following scenario:

1. Our logging system writes log data to LOG_PATH.
2. The current format is tab separated values.
3. We LOAD '$LOG_PATH'
4. We switch to Avro format and have to migrate.
5. The migration can not happen instantly, so it might be that at some
point in time some files in  LOG_PATH still have the TSV format while
other are already switched to Avro.

Thanks,
Johannes

Am 15.06.2012 14:37, schrieb Ruslan Al-Fakikh:
> Hi Johannes,
>
> I guess you'd have to write a custom Loader for such a situation, but
> why do you need to load everything in one pass? You can load different
> types of files separately (having multiple LOAD statements) and make a
> join or a union afterwards.
>
> Ruslan
>
> On Fri, Jun 15, 2012 at 4:13 PM, Johannes Schwenk
> <[EMAIL PROTECTED]> wrote:
>> Hi all,
>>
>> is it possible to have an input path (as parameter to a LOAD statement)
>> that contains several files in *different formats* - say serialized Avro
>> data and tab separated values and make pig read the data into one alias?
>> I guess I have to write an UDF for this? How should I start, can you
>> sketch out a rough plan on how to proceed?
>>
>>
>> Greetings,
>> Johannes Schwenk
>>
>> --
>> Softwareentwickler (Reporting)
>> ________________________________________________________
>>
>> ADITION technologies AG
>> Schwarzwaldstraße 78b
>> 79117 Freiburg
>>
>> http://www.adition.com
>>
>> T +49 / (0)761 / 88147 - 30
>> F +49 / (0)761 / 88147 - 77
>> SUPPORT +49  / (0)1805 - ADITION
>>
>> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>>
>> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
>> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
>> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
>> UStIDNr.: DE 218 858 434
>>
>
>
>

Johannes Schwenk

--
Softwareentwickler (Reporting)
________________________________________________________

ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg

http://www.adition.com

T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49  / (0)1805 - ADITION

(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)

Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434

+
Ruslan Al-Fakikh 2012-06-15, 13:24
+
Johannes Schwenk 2012-06-15, 13:39
+
Ruslan Al-Fakikh 2012-06-15, 13:55
+
Johannes Schwenk 2012-06-15, 16:05
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB