Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Processing data from HDFS

Copy link to this message
RE: Processing data from HDFS
I've evaluated Pig, but it's not suitable for my purpose.

Because, the CSV files that I have can have different column names, and column sequence for each file.
Also, the key is not present there in CSV. We need to calculate row Key for each record also.

Abhijeet Pathak
From: Alexander Alten-Lorenz [[EMAIL PROTECTED]]
Sent: 24 January 2013 1:10 PM
Subject: Re: Processing data from HDFS

Use PIG, a well written example you can find here:


On Jan 24, 2013, at 8:29 AM, Nitin Pawar <[EMAIL PROTECTED]> wrote:

> how are the files coming to hdfs?
> there is a direct hbase sink available for wrting data into hbase
> also from hdfs to hbase, you will need to write your own mapreduce job to
> put data in hbase
> On Thu, Jan 24, 2013 at 12:50 PM, Abhijeet Pathak <
>> Hi,
>> I've a folder in HDFS where a bunch of files gets created periodically.
>> I know that currently Flume does not support reading from HDFS folder.
>> What is the best way to transfer this data from HDFS to Hbase (with or
>> without using Flume)?
>> Regards,
>> Abhijeet Pathak
> --
> Nitin Pawar

Alexander Alten-Lorenz
German Hadoop LinkedIn Group: http://goo.gl/N8pCF