|
Q:
|
What does the name Topicalizer mean?
|
|
A:
|
In linguistics a topicalizer is a constituent that marks another constituent as the current topic. For example, in English there are topicalizers like 'regarding', 'given' and 'as for'. The idea behind this software is, amongst other aspects, about finding the topic (or rather topic framework) of a text, therefore this software can be seen as some kind of 'topic marker', too.
|
|
Q:
|
Why did you create Topicalizer?
|
|
A:
|
This software was created as some kind of feasibility study in computational linguistics in the first place. Besides this, in my opinion a tool like this was still lacking on the web.
|
|
Q:
|
Which programming language do you use for Topicalizer?
|
|
A:
|
Topicalizer was (and is being) built with Python and the Turbogears web framework.
|
|
Q:
|
Besides being interesting, what could be the purpose of this tool?
|
|
A:
|
The main purpose of this tool is providing webmasters, bloggers or any other kind of web author with a way of optimizing their websites regarding content structure, readability, topic coherence and last but not least search engine listings (for those into buzzword bingo: search engine optimization, SEO), since the latter is all about well-structured content and topic structure of a text.
Moreover, this software can be used for automatically acquiring some useful semantic information about a document. The keyword method of the Topicalizer API could be used for automatically tagging a blog entry, just to give you an idea.
|
|
Q:
|
When using Topicalizer with a specific URL as argument I receive a strange error message. Is there a way to avoid this?
|
|
A:
|
The reason for this might be that the HTML parser Topicalizer uses only understands well-formed HTML, so if a document should contain invalid HTML chances are that you receive an EXPAT parser error. One way to address this problem (apart from correcting the corresponding HTML code) is to use Topicalizer's plain text analysis option.
|
|
Q:
|
When using Topicalizer with a specific URL one time and the plain text contained behind this URL another time I receive different results. How come?
|
|
A:
|
The account for this is very much like the one for the aforementioned problem: The HTML parser Topicalizer uses only understands well-formed HTML, so if a document should contain invalid HTML chances are that you either receive an EXPAT parser error or that some of the HTML code cannot be filtered correctly (which sometimes is done on purpose by its creator, for instance as for ad server code), which in turn can lead to incorrect results. One way to address this problem (apart from correcting the corresponding HTML code) is to use Topicalizer's plain text analysis option.
Furthemore, you will receive the best results with the plain text option anyway, because this way you can be sure that there is no additional (and undesired) text like html headers, titles and copyright information at all.
|
|
Q:
|
Why is the document language important for the analysis?
|
|
A:
|
Topicalizer uses certain language-specific parameters like stop words and syllable structure, so choosing the appropriate language will significantly improve results.
|
|
Q:
|
Why does Topicalizer still have the option for manually selecting a language, if there is a working automatic language recognition?
|
|
A:
|
The automatic language recognition works well enough for texts that are long enough and have been written in one language only. However, you might run into trouble when using this feature, if either the text is too short or if it contains several languages in approximately equal shares, so you still can select a language manually, if you do not trust Topicalizer's guess.
|
|
Q:
|
How did you create this nifty fading effect on your logo?
|
|
A:
|
For this effect I used the 'Fade Anything Technique' developed by Adam Michela of www.axentric.com. Check out this page for further details.
|