Japanese-to-English Translator

About

A few years ago I started learning Japanese, and along the way I've created some tools to help me do so. One of those is a .

One of the first things I noticed about Japanese is that, unlike Latin-based languages, it's not trivial to determine where a word ends and the next one begins (i.e., there's no "space" character). I found this makes using dictionaries to look up sentences incredibly frustrating, so I wrote a script that looks up every combination of adjacent characters (in an online dictionary) until a translation for every character sequence in a given sentence has been found. This worked well enough for my needs, so I kept adding more features, like audio recordings or a visual aid for drawing the characters.

Technical Details

Server-Side
PHP receives the user input as a JSON object, transcribed into Latin, Hiragana, and Katana. This is done because the search results of online dictionaries may vary depending on which of those 3 encodings is searched for.
The user input is looked up in the excellent dictionary jisho.org (There's no special API for this, I just parse the HTML via the usual DOM magic). If no result is found, the last character of user input is dropped (to be searched separately later), and the process is repeated.
When all characters have been looked up, every word found is looked up in the dictionary WWWJDIC to get a link to an audio file.
All requests made by the server are cached in a MySQL database, both to improve response time and to reduce the strain I put on the other dictionaries.
The translations and audio links are then sent to the client as an XML document.
Client-Side
The page itself is XHTML, served by my CMS.
User input into the translator is transcribed every which way between Latin, Hiragana, and Katakana using an XML transcription table . This happens in real-time, so the translator can function as a substitute Japanese keyboard, too.
When user inaction is detected, the input is sent to the server to be processed as described above.
The server response is converted to HTML via XSLT .

Screenshots

Japanese-English Translator
You search for a sequence of characters; the translator transcribes them, breaks them up into words, translates them into English, reads them out to you, and tells you how to draw them.