|
|
-
2 questions, the log file name and the log messy code
Ying Tang 2010-11-19, 08:37
Hi all , 1. I have install 2 nodes chukwa for testing , one agent and one collector . And also i have an hdfs , but i found the log collected by the collector in hdfs , the file name is time+logsourcehost+java.rmi.server.UID() time's format is yyyyddHHmmssSSS , there is no month ? And this is been written in the code . I need the month , so i must change the code and recompile it ? 2. And another question , the log content in the log file(in the hdfs) , the metadata is messy code , the log content from the agent is ok. My adaptor is UTF8 , how to solve this?
-- Best regards,
Ivy Tang
-
Re: 2 questions, the log file name and the log messy code
Eric Yang 2010-11-19, 17:30
On 11/19/10 12:37 AM, "Ying Tang" <[EMAIL PROTECTED]> wrote:
Hi all , 1. I have install 2 nodes chukwa for testing , one agent and one collector . And also i have an hdfs , but i found the log collected by the collector in hdfs , the file name is time+logsourcehost+java.rmi.server.UID() time's format is yyyyddHHmmssSSS , there is no month ? And this is been written in the code . I need the month , so i must change the code and recompile it ? 2. And another question , the log content in the log file(in the hdfs) , the metadata is messy code , the log content from the agent is ok. My adaptor is UTF8 , how to solve this? 1. Looks like a mistake on the temp filename. Please open a jira and we will fix it. 2. The data is recorded in sequence file format to make the data easier to process with mapreduce. If you are expecting plain text of the log content, you will need to write a map/reduce job with output format to text output format and channel the log files types according.
Regards, Eric
-
Re: 2 questions, the log file name and the log messy code
Jerome Boulon 2010-11-19, 18:24
Just a warning if you are using Text output format then you will have some hard time with "\n" inside your logs like stackTrace for example. Also, text file will either be non-compressed or non-splittable.
/Jerome.
On 11/19/10 9:30 AM, "Eric Yang" <[EMAIL PROTECTED]> wrote: On 11/19/10 12:37 AM, "Ying Tang" <[EMAIL PROTECTED]> wrote:
Hi all , 1. I have install 2 nodes chukwa for testing , one agent and one collector . And also i have an hdfs , but i found the log collected by the collector in hdfs , the file name is time+logsourcehost+java.rmi.server.UID() time's format is yyyyddHHmmssSSS , there is no month ? And this is been written in the code . I need the month , so i must change the code and recompile it ? 2. And another question , the log content in the log file(in the hdfs) , the metadata is messy code , the log content from the agent is ok. My adaptor is UTF8 , how to solve this? 1. Looks like a mistake on the temp filename. Please open a jira and we will fix it. 2. The data is recorded in sequence file format to make the data easier to process with mapreduce. If you are expecting plain text of the log content, you will need to write a map/reduce job with output format to text output format and channel the log files types according.
Regards, Eric
-
Re: 2 questions, the log file name and the log messy code
Ying Tang 2010-11-23, 07:49
The messy code is my mistake. After using the SequenceFileInputFormat ,the file is clear . But the metadata in value is mixed with my log . Add a \n after the metadata is better.
On Sat, Nov 20, 2010 at 2:24 AM, Jerome Boulon <[EMAIL PROTECTED]> wrote:
> Just a warning if you are using Text output format then you will have some > hard time with ā\nā inside your logs like stackTrace for example. > Also, text file will either be non-compressed or non-splittable. > > /Jerome. > > > On 11/19/10 9:30 AM, "Eric Yang" <[EMAIL PROTECTED]> wrote: > > > > > On 11/19/10 12:37 AM, "Ying Tang" <[EMAIL PROTECTED]> wrote: > > Hi all , > 1. I have install 2 nodes chukwa for testing , one agent and one > collector . And also i have an hdfs , but i found the log collected by the > collector in hdfs , the file name is > time+logsourcehost+java.rmi.server.UID() > time's format is yyyyddHHmmssSSS , there is no month ? And this > is been written in the code . > I need the month , so i must change the code and recompile it ? > 2. And another question , the log content in the log file(in the > hdfs) , the metadata is messy code , the log content from the agent is ok. > My adaptor is UTF8 , how to solve this? > > > > 1. Looks like a mistake on the temp filename. Please open a jira and > we will fix it. > 2. The data is recorded in sequence file format to make the data easier > to process with mapreduce. If you are expecting plain text of the log > content, you will need to write a map/reduce job with output format to text > output format and channel the log files types according. > > > Regards, > Eric > > -- Best regards,
Ivy Tang
|
|