Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> encoding issue in kafka


Copy link to this message
-
Re: encoding issue in kafka
I thought there was another thread or ticket about this but I can't find
it.  Did we ever get to the bottom of this?

On 06/06/2012 05:45 PM, Roman Garcia wrote:
> I guess having to set the platform encoding shouldn't be the fix, right?
> If that's the case, then somewhere in the code the default encoding
> (platform) is being used...that was the StringEncoder I understand.
> Then again, I believe these were removed from trunk, right? (at least, I
> can't seem to find em)
> If they weren't removed (just moved outside), it should be considered to
> start using the expected String constructor: String(bytes, charset)
>
> Regards,
> Roman
>
> 2012/6/6 Patricio Echagüe <[EMAIL PROTECTED]>
>
>> I ran into the same issue and setting -Dfile.encoding=UTF-8 in the startup
>> script fixed it.
>>
>> On Wed, Jun 6, 2012 at 12:18 AM, 刘明敏 <[EMAIL PROTECTED]> wrote:
>>
>>> sorry for the late reply, it is my fault
>>>
>>> after set -Dfile.encoding=UTF-8  when start up producer
>>>
>>> problem solved.
>>>
>>> On Sat, Jun 2, 2012 at 6:07 PM, 刘明敏 <[EMAIL PROTECTED]> wrote:
>>>
>>>> we encountered an encoding issue when dealing with Chinese character
>>>>
>>>> the producer send characters in right encode(UTF-8),while after the
>>>> consumer get it ,it all turns into question marks:????
>>>>
>>>> when start up producer,kafka broker server and consumer, we tried
>>>> specified -Dfile.encoding=UTF-8,but it doesn't work
>>>>
>>>>
>>>> In producer,we use StringEncoder,below is the snippet of producer:
>>>>
>>>>
>>>>
>>>>
>>>>   val props = new Properties();
>>>>
>>>>
>>>>
>>>>   ...
>>>>
>>>>   props.put("serializer.class", "kafka.serializer.StringEncoder");
>>>>
>>>>
>>>>   props.put("compression.codec", "1") //gzip
>>>>
>>>>
>>>>
>>>>   val producerConfig = new ProducerConfig(props);
>>>>
>>>>
>>>>   val producer = new Producer[String, String](producerConfig);
>>>>
>>>>
>>>>     val data = new ProducerData[String, String](topic, partitionKey,
>>> List("string_to_send_to_borker"));
>>>>
>>>>
>>>>
>>>>   producer.send(data);
>>>>
>>>>
>>>>
>>>> and consumer:
>>>>
>>>>
>>>>
>>>>
>>>>     val topicMessageStreams >>> consumerConnector.createMessageStreams(Predef.Map(topic -> consumers),
>> new
>>> StringDecoder)
>>>>
>>>>
>>>>
>>>>     for ((topic, streamList) <- topicMessageStreams) {
>>>>
>>>>
>>>>       for (stream <- streamList) {
>>>>
>>>>
>>>>         val processor = new StreamProcessor(stream)
>>>>
>>>>
>>>>
>>>>         new Thread(processor).start();
>>>>
>>>>
>>>>       }
>>>>
>>>>
>>>>     }
>>>>
>>>>
>>>>
>>>> and the StreamProcessor just iterate each streams
>>>>
>>>>
>>>>   val message = iterator.next.message//chinese characters in message
>>> turns into ?????
>>>>
>>>>
>>>>
>>>> Anyone any help?
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> ----------------------
>>>> 刘明敏 | mmLiu
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> ----------------------
>>> 刘明敏 | mmLiu
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB