I met same problem what you mentioned above. Also I don't know if there
have existing solution, following existing sources, I cann't find out
anyone can resolve.
I have used custom 'tail' tool as independent component, interactive
with local Flume server by Avro RPC. This tail tool can transfer effective
elements such as file name, created time, and so on. Also it can record the
pointer to un-read lines to protect failure from process or server crash.
Local Flume server located near by that component to store messages, taking
responsibility for network issue between local and central Flume
Yes, this is extensional component, not the original. Much better if
someone can provide more simple solution for us, very well.
Thanks a lot!
2013/2/13 Rao, Mallik <[EMAIL PROTECTED]>
> Good Morning
> I am not sure whether this is the right place to ask question like this or
> not. If not
> please advise me, if there is a different forum to place this kind of
> We have a need to pull files from a different server(s) to HDFS. We need
> to preserve the
> file name. It will be lot easier for us to pull files than installing
> software on the remote server(s).
> We need to expect that the network may have issues sometimes and we may
> have failures
> and may need to continue from where we left off. In such scenario, we can
> create an extension
> to the file indicating we had to do in multiple attempts. We cannot move
> or rename the files on
> the source server. If we are restarting for some reason, we should not
> copy already copied files.
> Previously, we have done this using shell scripting.
> We are planning to use Flume. Is there any existing solution in Flume or
> we need to develop
> custom code?
> Mallikharjuna Rao