Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> HDFS sink leaves .tmp files


+
Chris Neal 2012-08-29, 14:18
+
Chris Neal 2012-09-10, 15:02
+
Eran Kutner 2012-09-10, 15:16
+
Chris Neal 2012-09-10, 15:21
+
Kathleen Ting 2012-09-10, 17:37
+
Chris Neal 2012-09-10, 18:59
+
Chris Neal 2012-09-10, 19:01
+
Bhaskar V. Karambelkar 2012-09-10, 20:08
+
Mike Percy 2012-09-10, 22:04
+
Kathleen Ting 2012-09-10, 22:09
+
Eran Kutner 2012-09-10, 23:15
+
Chris Neal 2012-09-11, 01:38
Copy link to this message
-
Re: HDFS sink leaves .tmp files
Just to follow up, the .tmp file problem did go away using 1.3.0-SNAPSHOT
on the HDFS sink agent.

Thanks again Kathleen :)

On Mon, Sep 10, 2012 at 8:38 PM, Chris Neal <[EMAIL PROTECTED]> wrote:

> Thanks Kathleen!
> I'll download that build tomorrow morning and give it a whirl.
>
> Chris
>
>
> On Mon, Sep 10, 2012 at 5:09 PM, Kathleen Ting <[EMAIL PROTECTED]>wrote:
>
>> [Moving to [EMAIL PROTECTED] |
>> https://groups.google.com/a/cloudera.org/group/cdh-user/topics since
>> this is getting to be CDH specific]
>> bcc: [EMAIL PROTECTED]
>>
>> Chris,
>>
>> When the file has not been closed by the client, the file size may be
>> shown as 0. The NameNode will not update the metadata about the file
>> until the block is completed or the file handle is closed. Even if it
>> updates at a block boundary, the size won't be accurate until the file
>> is closed.
>>
>> The metadata takes some time to populate even though the files may
>> contain data. The CDH4.1 version of Flume includes FLUME-1238, which
>> will do auto-rolling of files and helps lower the period where these
>> files appear to be 0 size.
>>
>> Since the CDH3u5 version of Flume is compatible with CDH3* Hadoop and
>> the CDH4 Flume is compatible with CDH4* Hadoop, you can download the
>> nightly build of flume-ng-1.2.0-cdh4.1.0 from
>> http://nightly.cloudera.com/cdh4/cdh/4/
>>
>> Regards, Kathleen
>>
>> On Mon, Sep 10, 2012 at 1:08 PM, Bhaskar V. Karambelkar
>> <[EMAIL PROTECTED]> wrote:
>> > Don't know about RPM, but there's a 1.2.x tarball of the 1.2 build @
>> > http://archive.cloudera.com/cdh/3/flume-ng-1.2.0-cdh3u5.tar.gz
>> >
>> >
>> > On Mon, Sep 10, 2012 at 3:01 PM, Chris Neal <[EMAIL PROTECTED]> wrote:
>> >>
>> >> Just checked, and from Cloudera, 1.1.0+121-1.cdh4.0.1.p0.1.el6 is still
>> >> the latest from their yum repo.
>> >>
>> >>
>> >> On Mon, Sep 10, 2012 at 1:59 PM, Chris Neal <[EMAIL PROTECTED]> wrote:
>> >>>
>> >>> I'm using a combination :)
>> >>>
>> >>> The application tier is 1.3.0-SNAPSHOT
>> >>> The HDFS tier is CentOS, and I grabbed the latest (at the time) from
>> the
>> >>> CDH repo.  It's version is:  1.1.0+121-1.cdh4.0.1.p0.1.el6
>> >>>
>> >>> If the issue is on the HDFS sink side, that it could definitely be in
>> my
>> >>> version!
>> >>> I'll check if Cloudera has a more recent version to update to.
>> >>>
>> >>> Thanks!
>> >>> Chris
>> >>>
>> >>>
>> >>> On Mon, Sep 10, 2012 at 12:37 PM, Kathleen Ting <[EMAIL PROTECTED]>
>> >>> wrote:
>> >>>>
>> >>>> Chris, Eran, this appears to be FLUME-1238, which was fixed in
>> >>>> Flume-1.2.0. Can you let me know if you are using Flume-1.2.0?
>> >>>>
>> >>>> Thanks, Kathleen
>> >>>>
>> >>>> On Mon, Sep 10, 2012 at 8:21 AM, Chris Neal <[EMAIL PROTECTED]>
>> wrote:
>> >>>> > Glad to know it's not just me :)
>> >>>> >
>> >>>> >
>> >>>> > On Mon, Sep 10, 2012 at 10:16 AM, Eran Kutner <[EMAIL PROTECTED]>
>> wrote:
>> >>>> >>
>> >>>> >> I have the same problem. I roll every 1 minute so I have tons of
>> >>>> >> those
>> >>>> >> .tmp files.
>> >>>> >>
>> >>>> >> -eran
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> On Mon, Sep 10, 2012 at 6:02 PM, Chris Neal <[EMAIL PROTECTED]>
>> wrote:
>> >>>> >>>
>> >>>> >>> I'm still seeing this consistently every 24 hour period.  Does
>> this
>> >>>> >>> sound
>> >>>> >>> like a configuration issue, an issue with the Exec source, or an
>> >>>> >>> issue with
>> >>>> >>> the HDFS sink?
>> >>>> >>>
>> >>>> >>> Thanks!
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> On Wed, Aug 29, 2012 at 9:18 AM, Chris Neal <[EMAIL PROTECTED]>
>> >>>> >>> wrote:
>> >>>> >>>>
>> >>>> >>>> Hi all,
>> >>>> >>>>
>> >>>> >>>> I have an Exec Source running a tail -F on a log4J-generated log
>> >>>> >>>> file
>> >>>> >>>> that gets rolled once a day.  It seems that when log4J rolls the
>> >>>> >>>> file to the
>> >>>> >>>> new date, the hdfs sink ends up with a .tmp file.  I haven't
>> >>>> >>>> figured out if
>> >>>> >>>> there is any data loss yet, but was curious if this is expected
>> >>>> >>>> behavior?
+
Kathleen Ting 2012-09-13, 16:08
+
Shara Shi 2012-11-06, 02:47
+
Kathleen Ting 2012-11-06, 20:05