Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> encoding issue in kafka


Copy link to this message
-
Re: encoding issue in kafka
I thought there was another thread or ticket about this but I can't find
it.  Did we ever get to the bottom of this?

On 06/06/2012 05:45 PM, Roman Garcia wrote:
> I guess having to set the platform encoding shouldn't be the fix, right?
> If that's the case, then somewhere in the code the default encoding
> (platform) is being used...that was the StringEncoder I understand.
> Then again, I believe these were removed from trunk, right? (at least, I
> can't seem to find em)
> If they weren't removed (just moved outside), it should be considered to
> start using the expected String constructor: String(bytes, charset)
>
> Regards,
> Roman
>
> 2012/6/6 Patricio Echagüe <[EMAIL PROTECTED]>
>
>> I ran into the same issue and setting -Dfile.encoding=UTF-8 in the startup
>> script fixed it.
>>
>> On Wed, Jun 6, 2012 at 12:18 AM, 刘明敏 <[EMAIL PROTECTED]> wrote:
>>
>>> sorry for the late reply, it is my fault
>>>
>>> after set -Dfile.encoding=UTF-8  when start up producer
>>>
>>> problem solved.
>>>
>>> On Sat, Jun 2, 2012 at 6:07 PM, 刘明敏 <[EMAIL PROTECTED]> wrote:
>>>
>>>> we encountered an encoding issue when dealing with Chinese character
>>>>
>>>> the producer send characters in right encode(UTF-8),while after the
>>>> consumer get it ,it all turns into question marks:????
>>>>
>>>> when start up producer,kafka broker server and consumer, we tried
>>>> specified -Dfile.encoding=UTF-8,but it doesn't work
>>>>
>>>>
>>>> In producer,we use StringEncoder,below is the snippet of producer:
>>>>
>>>>
>>>>
>>>>
>>>>   val props = new Properties();
>>>>
>>>>
>>>>
>>>>   ...
>>>>
>>>>   props.put("serializer.class", "kafka.serializer.StringEncoder");
>>>>
>>>>
>>>>   props.put("compression.codec", "1") //gzip
>>>>
>>>>
>>>>
>>>>   val producerConfig = new ProducerConfig(props);
>>>>
>>>>
>>>>   val producer = new Producer[String, String](producerConfig);
>>>>
>>>>
>>>>     val data = new ProducerData[String, String](topic, partitionKey,
>>> List("string_to_send_to_borker"));
>>>>
>>>>
>>>>
>>>>   producer.send(data);
>>>>
>>>>
>>>>
>>>> and consumer:
>>>>
>>>>
>>>>
>>>>
>>>>     val topicMessageStreams >>> consumerConnector.createMessageStreams(Predef.Map(topic -> consumers),
>> new
>>> StringDecoder)
>>>>
>>>>
>>>>
>>>>     for ((topic, streamList) <- topicMessageStreams) {
>>>>
>>>>
>>>>       for (stream <- streamList) {
>>>>
>>>>
>>>>         val processor = new StreamProcessor(stream)
>>>>
>>>>
>>>>
>>>>         new Thread(processor).start();
>>>>
>>>>
>>>>       }
>>>>
>>>>
>>>>     }
>>>>
>>>>
>>>>
>>>> and the StreamProcessor just iterate each streams
>>>>
>>>>
>>>>   val message = iterator.next.message//chinese characters in message
>>> turns into ?????
>>>>
>>>>
>>>>
>>>> Anyone any help?
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> ----------------------
>>>> 刘明敏 | mmLiu
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> ----------------------
>>> 刘明敏 | mmLiu
>>>
>>
>