Topicalizer now detects and displays custom document character sets, where before it simply assumed everything to be ISO-8859-1 or UTF-8. If no custom character set is detected or if plain text is submitted, UTF-8 is used instead.
This improved behaviour regarding character sets leads to an improvement concerning automatic language recognition and the overall analysis as well, especially for languages that contain many special characters like for instance, Norse languages or Russian.