May 142008

In the recently released Piclens build, we’ve added support for displaying non-English / non-Western language characters in the image titles on the PicLens 3D wall.  You can see the results on mediaRSS photo web sites in Chinese, Korean, Japanese, and other languages.  You can also play with this with carefully placed queries to major image search engines, such as searching for ??” on Flickr.

It’s funny how difficult it can be to find something in a different language.  If you search Flickr for “Chinese”, you’ll get lots of photos of Chinese people or landscapes, but none of those search results will actually contain Chinese language characters. To find images that are tagged in a particular language, it usually helps to look for terms in that language.

This makes sense for the general use case:  If you enter a search query in a particular language, there’s an implied contract that you want the answer in that language as well.

But what if you really do want results in a different language from your query? To test PicLens’ new multilingual text rendering, I needed to find web sites with interesting photo content *and* non-English, non-Latin text labels.  After much trial and error, I stumbled onto an easy trick to get search engines to return what I was looking for:  find a web site with text in the language I’m looking for, copy and paste it into the search engine search box, and see what comes back.  This technique is a little hit or miss with Google Image Search, but quite effective with Flickr search.

So where do you find a web site with characters in a particular language? Simple:  the broadest multilingual site I’ve ever encountered:  Wikipedia!  Just scan down the left nav column on the Wikipedia home page and pick out the language you’re looking for.  Highlight the text (the name of the language in that language), copy it to the clipboard, then paste that text into the Flickr search box.  (Make sure you’re copying the text, not the URL)  Presto! A nearly endless supply of photos by Koreans with titles in Korean text!

PicLens currently only supports UTF-8 encoded mediaRSS feeds.  We’ll be adding support for additional character encodings in the near future.

PicLens does not yet support editing CJK texts in the PicLens search box. We’re studing the OS system services on Windows and Mac to see how we might implement IME (Input Method Editors) in our full-screen graphics environment without having to reinvent the world.

PicLens relies on TrueType fonts already installed on your system.  Whether we can display a particular character depends primarily on whether that character is supported by the TrueType fonts we’ve found on your system. 

On Windows, our primary font is Lucida Sans Unicode.  If PicLens needs to display a character that isn’t supported by Lucida Sans Unicode, we look for the Arial Unicode font on your system and try to load the character from that font.  Arial Unicode is not installed by default with Windows XP or Vista, but it is installed if you select any of the Asian language packs for XP or Vista.  Arial Unicode may also be installed by Microsoft Office or other applications.  If we run out of places to look for character glyphs, we’ll fall back to drawing a lovely rectangle on the screen.

On the Mac, our primary font is Lucida Grande. For characters not supported by Lucida Grande, we look for Arial Unicode or Apple Gothic fonts on your system.

We selected these secondary fonts based on their breadth of glyph support. Arial Unicode in particular has very broad support across Chinese, Japanese, Korean, and several other languages. If there are languages that are not well represented by these fonts, we can certainly add another font to the substitution list to handle characters not supported by the current set of fonts.

Multilingual text in Piclens is definitely a beta feature.  We will continue to fine-tune it as we learn of issues. If you see something that’s not displayed correctly in PicLens that is displayed correctly by the browser, please drop us a note at the [email protected] address so we can check it out!