I've evaluated Pig, but it's not suitable for my purpose.
Because, the CSV files that I have can have different column names, and column sequence for each file.
Also, the key is not present there in CSV. We need to calculate row Key for each record also.
From: Alexander Alten-Lorenz [[EMAIL PROTECTED]]
Sent: 24 January 2013 1:10 PM
To: [EMAIL PROTECTED]
Subject: Re: Processing data from HDFS
Use PIG, a well written example you can find here:
On Jan 24, 2013, at 8:29 AM, Nitin Pawar <[EMAIL PROTECTED]> wrote:
> how are the files coming to hdfs?
> there is a direct hbase sink available for wrting data into hbase
> also from hdfs to hbase, you will need to write your own mapreduce job to
> put data in hbase
> On Thu, Jan 24, 2013 at 12:50 PM, Abhijeet Pathak <
> [EMAIL PROTECTED]> wrote:
>> I've a folder in HDFS where a bunch of files gets created periodically.
>> I know that currently Flume does not support reading from HDFS folder.
>> What is the best way to transfer this data from HDFS to Hbase (with or
>> without using Flume)?
>> Abhijeet Pathak
> Nitin Pawar
German Hadoop LinkedIn Group: http://goo.gl/N8pCF