201102141159

python character encoding with mediawiki api

BackgroundI was creating a python module modifying double (chain) redirect pages in mediawiki. While testing, it failed when writing non-american alphabet text. An UnicodeEncodeError was raised like below:UnicodeEncodeError: ‘ascii’ codec can’t encode charactes in position 19-22: ordinal not in range(128)At first glance, I guessed unicode will solve this problem, but the same error happened even though I encoded the whole text paramter with unicode() function. Simply using unicode didn’t help me.CauseIn mediawiki tool’s Page.edit(), text string is md5-hashed unless “skipmd5” parameter is specified. However, md5 hash fails with non-ascii characters. I can’t use ascii because most of string parameters have to include non-alphabetical characters.SolutionInstead of using ascii or unicode, utf-8 helps in this case. Since utf-8 is compatible with ascii and still handles non-ascii characters, it doesn’t cause an error in md5 hash. Once I considered disable md5 hash by setting “skipmd5” parameter to True, but it doesn’t seem to be a right solution.

  • 2011/02/14 11:59 에 작성

results matching ""

    No results matching ""