|
|
Rao, Mallik 2013-02-13, 14:17
Good Morning
I am not sure whether this is the right place to ask question like this or not. If not please advise me, if there is a different forum to place this kind of question.
We have a need to pull files from a different server(s) to HDFS. We need to preserve the file name. It will be lot easier for us to pull files than installing software on the remote server(s).
We need to expect that the network may have issues sometimes and we may have failures and may need to continue from where we left off. In such scenario, we can create an extension to the file indicating we had to do in multiple attempts. We cannot move or rename the files on the source server. If we are restarting for some reason, we should not copy already copied files. Previously, we have done this using shell scripting.
We are planning to use Flume. Is there any existing solution in Flume or we need to develop custom code?
Thanks Mallikharjuna Rao
Denny Ye 2013-02-14, 03:46
hi Rao, I met same problem what you mentioned above. Also I don't know if there have existing solution, following existing sources, I cann't find out anyone can resolve. I have used custom 'tail' tool as independent component, interactive with local Flume server by Avro RPC. This tail tool can transfer effective elements such as file name, created time, and so on. Also it can record the pointer to un-read lines to protect failure from process or server crash. Local Flume server located near by that component to store messages, taking responsibility for network issue between local and central Flume (collector). Yes, this is extensional component, not the original. Much better if someone can provide more simple solution for us, very well. Thanks a lot! -Regards Denny Ye 2013/2/13 Rao, Mallik <[EMAIL PROTECTED]>
> Good Morning > > I am not sure whether this is the right place to ask question like this or > not. If not > please advise me, if there is a different forum to place this kind of > question. > > We have a need to pull files from a different server(s) to HDFS. We need > to preserve the > file name. It will be lot easier for us to pull files than installing > software on the remote server(s). > > We need to expect that the network may have issues sometimes and we may > have failures > and may need to continue from where we left off. In such scenario, we can > create an extension > to the file indicating we had to do in multiple attempts. We cannot move > or rename the files on > the source server. If we are restarting for some reason, we should not > copy already copied files. > Previously, we have done this using shell scripting. > > We are planning to use Flume. Is there any existing solution in Flume or > we need to develop > custom code? > > Thanks > Mallikharjuna Rao > >
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext