Get Hep! Cool Flavors Ice Mocha Drink up!
Ice Mocha Credits Slash Evolution

Work on Ice Mocha began with the creation of the supporting files necessary to build a Japanese text imaging proxy server: now fully integrated into every function of the applicaton. Proxy server development slowly crawled along in 2002, and then Jim Rose moved on toward the full time restructuring of EDICT's format in the fall of 2003. (EDICT is a huge electronic English-Japanese dictionary project which was started in 1991 by Professor Jim Breen at Monash University in Australia). It was decided early on to build a file which would contain the position of the yomigana for each of EDICT's 106,000+ words. Euphonic variations in kanji readings were then theorized from KANJIDIC (a huge collection of information on over 6,353 kanji also assembled by Professor Breen), and EDICT's errors were exaustively rooted out and repaired using tools built to find mismatched readings with words. Most of these errors were passed on to Professor Breen in Australia - to give future generations of tool builders a cleaner EDICT to work with. Two attempts were made to derive a file containing EDICT's Yomigana with some help from KANJIDIC. The first one was an extensive tour-de-force of predictive and retroactive logic. It failed miserably, only correctly parsing the reading of about 85% of EDICT. Then Jim (Rose), in a flash of caffeine induced genius, developed a kind of "extending string theory" to derive the yomigana from the reading, which when combined with a euphony hunting algorithm, correctly parsed about 99.5% of the mammoth file. An unconfirmed rumor has it that Jim (Rose) received this deep revelation while drinking an Ice Mocha from Brügger's Bagels.

A set of tools were then made to integrate error discovery with on-the-fly manual correction. This still didn't make creating a file with EDICT's Yomigana easy, as the compution time for some words was sometimes as long as a minute. It still took several weeks of computer time to derive the results once the stronger algorithm was deduced. Then extensive work was done with a team of 50 beta testers across the Internet who volunteered to help verify Ice's handling of dynamic objects - necessary to make the Yomigana toggle work on most browser platforms. At that point, it just came down to figuring out what features Ice should have, and thinking them out within the limitations of the Perl and Javascript languages. The application was released to the public at the end of 2003.

Post-Release Updates:

July 24. 2007 The 3rd public release of Stroke Order Diagrams (SODs) and Stroke Order Diagram Animations (SODAs) hits the Internet today. 1,500 Kanji depicted. The archive is now the 2nd largest on the Internet, and largest that you can download to your own website or home computer. The next public release will be the largest set on the Internet.

May 3, 2007 This may not actually be worthy of mention, but there is now a tiny 10 X 5 pixel image which says "SOD" if the Stroke Order Diagram for a given kanji in the word being examined exists. Most of Jim Rose's current efforts at Ice Mocha development are devoted to an overhaul of the radical decomposition system, and editing the backlog of SOD submissions. 1,445 SODs and SODAs are edited and available for use.

October 14, 2006 Jim successfully installs modern GD.pm on a Mac Mini running OS X 10.4.8 and the SODER project resumes. Ice Mocha has well over 1,100 SODs and SODAs at this point.

June 1, 2006 Jim & Arlene Rose nearly killed in a freak accident when their Jeep flips back over front 7.5 times at 80 mph. The laptop containing software which converted SODER project diagrams into animations was ejected from the vehicle. The SODER project, as well as all Ice Mocha development comes to a complete halt.

March 26, 2006 Fixed a bug first noticed by Kim Desmond where dictionary searches ignored exact matches if the end of the match was also the end of the dictionary entry. Exact searches for words like "cat" gathered lots of cat entries, but ignored "cat" itself. This bug doesn't seem to have existed before the March 17 update, so I must have inadvertantly removed the EOL regex condition thinking it wasn't necessary. So much for programming late at night with your eyes half shut.

March 17, 2006 Do you want the bad news first, or the good news first? The bad news is that I had to reset everyone's account to day 1. This will naturally anger a few, frustrate some, but hopefully most of you will simply roll like an aikido master with the punches and use this as an opportunity to be more picky about which words to add to your list this time around. And why did this happen? Why Jim have you inconvenienced us? I didn't want to, it just worked out that I had to. The good news is this: The reason is that I've completely re-engineered a good bit of the file system under the hood. After incorporating Ice with the Tanaka Corpus (TC), it struck me and several other people that Ice Mocha had fundamentally changed. What was in its infancy a glorified flash card system par excel-ante, had suddenly matriculated into the realm of the supernatural. It has actually become an almost magical teaching tool that mesmerizes the user who without much effort at all finds himself actually learning to read Japanese. Killing time with Ice Mocha is like taking hyper-linked adventures into the Matrix - a universe of interconnectedness. So Jim Breen's monumental task of cleaning up Professor's Tanaka's corpus deserved my full attention, and whatever cooperation I could muster from my aging life force.

With error reports flooding the Internet, and SLJ regulars like Paul Blay streaming in fixes by the thousands, the TC and its glossary are evolving at a record pace, and in order to keep up, EDICT itself is making evolutionary jumps. Ice Mocha was originally built on a 2003 version of EDICT. The next upgrade didn't happen until 2006. For Ice Mocha users to assist in making a better Corpus, Ice needed to reflect the corrections in quasi-real time.

But alas, Ice Mocha was quite the high-maintenance spirit. The time consumed to process EDICT and the TC, in keeping with her demanding architecture, was measured in days - and a file system of nearly 400 Mbyte. To contribute meaningful error reports to Jim in Australia, Ice's underlying file system would have to be rebuilt with freshly corrected files on a nightly basis. To achieve that goal required about a month of programming in the area of streamlining and automating the file system downloading, processing, and uploading. Extensive use of bifurcated binary sorting algorithms intermingled with Schwartzian Transformations and fuzzy logic had to be applied to normalize processing speeds. Unix installations were made. A twin Macintosh system took shape so that two processors could share duties. Tools like wget, Perl modules like Time::HiRes and Encode. Ice had become a "system" which can in theory now upgrade itself in 6 hours with very little human interference. So in the midst of all this massive engineering, I decided I might as well upgrade how Ice accesses the dictionary files too. Instead of searching files by index, I've shifted to using byte offsets. She's become less of a hack. Instead of starting the file pointer at the beginning of a file and slowly moving though it counting how many records are passed until reaching the word you're looking for, this technique places the file pointer directly at the very byte on the hard-drive that your record starts at. That meant that not only Ice, and its dozen or so function libraries had to be rebuilt, but indices of the files had to be created and sorted, and the type of indices used to store words on your study lists had to change as well. The payoff is that byte offset indexing is much faster. Hopefully you will notice!

So now that Ice is on the verge of being able to update itself to versions of EDICT and the TC scarcely hours old, I'm striving to incorporate an error slash suggestion feature into Ice Mocha to help clean up the Tanaka Corpus even further (with YOUR help). The missing glosses, the premature parses, etc.. Given how enormous the TC is, errors are bound to be plentiful for some time. Hopefully you will see new buttons for suggesting errors in the near future (hopefully before Jim returns from Europe in 6 weeks).

March 16, 2006 While frustrated over the complexity of trying overhaul Ice's entire infrastructure to byte offsets, I took a break and starting adding both Japanese names and English nicknames to the radical decompositions on the kanji pages. Did anyone notice?

March 10, 2006 Added a toggle to turn a list of keyboard commands on and off. Some improvement of the interface.

On February 17, 2006 2:30 AM Ice Mocha began keeping track of which example sentence was last viewed for any and all words on your study list. This feature will help keep you from seeing the same sentences over and over.

On February 15, 2006 Jim Rose added keyboard commands for viewing kanji information with HTML4 compliant browsers. If the kanji is the 1st one listed after the word being studied, simply press '1' on your keyboard. If it is the 2nd kanji, press '2', etc. The key, as always, to truly enjoy Ice Mocha, is to make sure you are NOT using Microsoft Internet Explorer. It gives the worst interpretation of the application, and cannot process key commands. The 'radicals' function, or as it is often called 'kanji lookup-by-multi-radical', has been overhauled to allow you to return to the same 'state' after leaving the radical selection table for any reason. This way you can look up one kanji, get lost exploring links, and simply return to the same batch of similar kanji without replugging in the radical choices. Some other minor details you will not notice involve how the stroke order diagrams and animations are stored on the server.

On February 14, 2006 Jim Rose completes a 1st attempt at integrating Ice Mocha with the Tanaka Corpus. This huge body of sentence pairs was compiled by the late Professor Yasuhito Tanaka at Hyogo University who had released the corpus into the public domain several years ago. Over the years, each of Tanaka sensei's students were tasked with collecting 300 Japanese-English sentence pairs. Many appear to be derived from textbooks, the Bible, children's stories, and songs. Some are a bit long. Jim Breen later standardized the entries, added some gender-specific clues, removed duplicates, and corrected many of the errors (with a little help from the sci.lang.japan gang), and processed the Japanese sentences with the Nara Institute of Technology's Chasen morphological analysis program. The end result was a pretty handy, although often incomplete glossary of Japanese words for each sentence. There is still much work to do on the Corpus, and it has many flaws, but nevertheless constitutes a fundamental alteration to Ice Mocha's capabilities. You can now spend hours learning to read, all the while keeping on track with an emphasis on your vocabulary words. Happy Valentine's Day!

Also on February 14, 2006, Jim Rose restored some sanity to Ice Mocha by implementing some common sense features like the 'OK' button, which allows you to turn off lists of suggested words, JLPT 1-4 words, lists of words in your 'a', 'b', and 'c' lists, words discovered by searching the dictionary, and the radical search table and/or results. On an HTML4 compliant browser you can also toggle the 'OK' feature by depressing the letter 'o'. You can also conjure up the radical table on said HTML4 compliant browser now by depressing the 'b' (think bushu) button on your keyboard. And most importantly, you can resume working on the ABC study list word after selecting other words from one of the above lists by pressing the 'return' button, or by depressing the 'r' key in an HTML4 compliant manner. The absence of these tiny features created a real headache for those who sought temporary adventures in the midst of pounding through their study list regimes.

January 30, 2006 Ice Mocha's dictionary expanded. Ice Mocha's dictionary was originally based on a 2003 version of EDICT. The January 19, 2006 version of EDICT was proofed and error reports sent to Australia. The yomigana parsing engine was tweaked and over 116,000 words were parsed in just over 104 minutes. The new basis files add some 5,000 new priority words, and relegated 2,000 formerly priority words to the non-priority category and the new dictionary adds about 10,000 words to Ice Mocha in all. A brute force algorithm mapped all old dictionary indices stored on user's a, b, and c lists to the new dictionary. The mapping required 2 days of computation time, but future upgrades hope to exploit simularities between EDICT versions to dramatically shorten mapping time with a target of 20 minutes as the goal. A starting point codebase has been created to make updating Ice Mocha to new versions of EDICT less painful. In the future it is hoped such upgrades will only require a few hours of computation with further enhancements and optimizations of the code base. New account creation has been cgi wrapped to move Ice Mocha closer to this goal. This means nothing to you unless you've built a similar application so don't sweat it. Ice's dictionary now stores its yomigana data in compressed format. JLPT 4 through 1, suggested words, etc, have all been updated to the new indices.

As of December 21, 2005, kanji look-up will now display the character's associated radicals and meanings used for lookup-by-multi-radical. These can often provide an etymological insight into the kanji's meaning. Some of the 250 radicals currently used by Ice Mocha are in-fact kanji, and often kanji which are not in the JIS 208 standard, but perhaps in the JIS 212 set. Testing has shown that Firefox, Netscape, and Safari are capable of JIS 212 display, and in non-proxy mode, Ice sends some of these radicals as JIS 212 kanji if it detects any of these three browsers.

On December 16, 2005, Jim (Rose) released SODER v.III, an all new point-n-click version of the web based graphics editing application used by volunteers over the Internet to create Stroke Order Diagrams (SODs) for Ice Mocha. There are 652 SODs at this point, from 1 to 12 strokes, with the goal of creating at least 6,353 in total. So just as the best tool in existance for the project is released, we're more than 10% through the project. Everyone is working hard to try and finish 1,000 diagrams of up to 20 strokes each by early 2006.

You can now have a SODA with your Ice Mocha. At some God-awful hour in the very early morning of September 6, 2005, Jim (Rose) uploaded the first 500 "light" version Stroke Order Diagram Animations (SODAs). With the flick of the return button on his Macintosh Terminal, SODs from the SODER project were whisked down from the web-server, turned into animations, and then promptly and automatically shuffled back up to the server from the Mac. (Actual conversion only took 12 seconds.) Each time new SODs are created by the volunteer corps of the SODER project, the diagrams are downloaded, the animations are created and uploaded. Too tired to appreciate the magnitude of his accomplishment, Jim promptly went to bed. He also found time to completely rewrite the way Ice handles the login registration file and believes that some people have permanently lost their account because of hasty programming decisions made in December 2003 to hurry the release of the multi-user version of the Ice Mocha application - never thinking there would be 3,000 accounts in just a year-and-a-half pushing Jim's stop-gap, band-aid code to its limits. This problem should now be a thing of Ice Mocha's primordial past - if you have lost access to your account for no apparent reason, and the ERROR messages DO NOT say that you have already registered an account under that email, you are urged to reregister a new account. Work now continues on a "heavy" version of Animated Stroke Order Diagrams (ASODs). Whereas "light" SODAs show one stroke per frame, "heavy" ASOD animations will depict the ink of the brush pen being laid in smooth brush motions. If you don't have the bandwidth for the coming ASODs, stick to consuming SODAs. All of these visual media are derived from the same SODER group: SOD => SODA => ASOD.

On August 23, 2005, Jim (Rose) adds the "radicals" feature to Ice, allowing the user to perform kanji lookup-by-multi-radical. This feature is made possible by the incorporation of an improved version of "RADKFILE" (Copyright 2001 Michael Raine, J. W. Breen) into the Ice Mocha brain. According Professor Breen, Ice is now one of only two websites to support this feature. The first was WWWJDIC. RADKFILE is based on work performed in 1994/1995 by Michael Raine in which he analyzed all the JIS1/2 kanji and identified the constituent radicals and other common elements, with the intention of facilitating the selection of kanji within a dictionary program by identifying multiple elements. The file was revised by Jim Breen in September 1995. Further revisions were carried out in 1998/1999 at the suggestion of Wolfgang Conrath, then a revision was carried out in 2001 using suggestions from Yutaka Ohno based on a similar decomposition made by Kobayashi. Further amendments were made in July 2001 after suggestions from Hendrik. Jim Rose made several corrections to the file, and organized its kanji by stroke count before wrapping a library of Mocha code around it.

On July 23, 2005, there are about 3,000 registered users. Jim is thinking out the details of a kanji "look-up by radical" function, and will begin programming shortly.

On November 30, 2004, the very first Stroke Order Diagrams (SODs) created by the SOD Editor-Retrographer (SODER v.I) were uploaded into the Ice Mocha application. The SODs display whenever more information is requested for a given word's kanji. SODER is one of the first Internet tools Jim (Rose) ever developed after doing actual research and experimentation - in fact several years worth. That either means its really cool, or he's a really slow programmer. It has many little subsystems which make it tick, and even though it is certainly not optimized nor pretty, we're all quite proud of it. A list of the 5 top contributing volunteers is now on the KanjiCafe.com homepage. So far, several hundred SODs of 10 strokes or less are available to Ice users, with more added every day. Improved versions of SODER are required to complete the entire 6,000+ kanji used by Ice, and are on the way.

On October 15, 2004, Jim Rose completed the ability to add vocabulary words from either the JLPT 4, 3, 2, or 1 (Japanese Language Proficiency Test) that were not already on one of your study lists (a, b, or c) - idea suggested by Alanna. He also cleaned up the controls a little, placed a letter by the word being studied to tell you which study list it is on, and put in a safegaurd against changing a word's emphasis if its not the current word being examined. Alas, he wanted to do more, but grew tired from lack of sleep.

On August 11, 2004, Jim Rose rewrote the way Ice handles the kanji information file so as to end its former practice of speedy, but RAM wasting "file slurping". Part of the reason is that Jim found converting Ice from Perl to mod_Perl unfruitful - primarily because he doesn't really understand the whole multi-threaded processes concept in unix machines and why they just have to screw up all the variable values. So Jim thought he had better conserve some RAM inside of regular ole Perl in preparation for swarms of users making hundreds of page loads on the server. His Perl only modification will allow for the eventual expansion of the file to cover kanji etymology etc, as well as to enhance and preserve the application's performance a little better when there are multiple people using the application over the web at the same time (despite not running Ice more directly as an Apache module as it would in mod_Perl). Only the most minor of files are now slurped into memory - and at the moment, the mod_Perl conversion has been abandoned. The application's speed does not seem to be diminished at all from this upgrade, and mod_Perl conversion can probably be avoided as long as there are fewer than 10,000 users, right? Right? As of this date, there are nearly 1,400 Ice Mocha users... this despite virtual invisibility on the web.

In the first week of May, 2004 Jim Rose included the ability to view and edit the three study lists, and to shuffle the order of words on the lists.

Jim Breen compiled both EDICT and KANJIDIC over the course of more than a decade. They are now the basis of countless Japanese language learning tools all over the world - none of which could have existed without professor Breen's devotion to the project. If it wasn't for EDICT and KANJIDIC, Ice Mocha would basically suck. Without the venerable professor from Oz, it probably would have been a tool requiring you to manually add your own words if it would have existed at all. Yuck!

It also goes without saying, though it should be said much more often, and on many more web sites the world over, that Ice Mocha owes its existence to the development of HTML by Sir Timothy Berners-Lee at MIT. Jim Rose met Tim at an NEC award ceremony in MIT's World Wide Web Consortium. It struck Jim that the one man who has done more to explode the flow of knowledge than anyone else in human history was also one of the nicest guys around. Here's to you Tim!

Bill Gates (Scottish) finally found the energy to copy yet another feature of the Macintosh (several years later), and now gives people fairly simple access to Japanese fonts on their PC. Hopefully he'll make entering Japanese easy someday too (Bill if you're reading - buy a Mac and study it). At one time, not too long ago, the proxy server feature on Ice was more important than now. Much of this technology created by Jim Rose requires the Perl module GD.pm written by Lincoln Stein at the Cold Spring Harbor Laboratory. GD.pm is a Perl 5.x interface to Thomas Boutell's gd library that allows you to generate PNG and JPEG images on the fly.

Thanks are also in order to Larry Wall for inventing Perl, and Sun Microsytems for creating Javascript. Without the independent work of Sun, Larry, Tim and professor Breen, no Ice Mocha. Scary isn't it?


Home | Ice Mocha | Expresso Ristretto | SODER Home
Rolomail.com | Mangajin.com | Joyo96.org | JapanPoem.com
© 2003-2006 The Kanji Cafe