Sunday, January 29, 2006

Google in China

I am a big Google fan, and I have dismissed a lot of the criticism that they have received recently, but I have to admit I am a little disappointed with the news that their new China portal will filter its search results per the CCP's liking. Some have canceled their Adwords accounts in protest. Since Google works just fine in Chinese as it is, I have been wondering what the point of the whole thing was. Most savvy Internet users in China who would want to find "illicit" information have ways around the Great Firewall, and the majority, sadly, don't care about such stuff anyway. Look at how widely-used the censored MSN Spaces is in China.
I just found this post on Google's official blog that provides some answers:
Google users in China today struggle with a service that, to be blunt, isn't very good. appears to be down around 10% of the time. Even when users can reach it, the website is slow, and sometimes produces results that when clicked on, stall out the user's browser. Our Google News service is never available; Google Images is accessible only half the time. At Google we work hard to create a great experience for our users, and the level of service we've been able to provide in China is not something we're proud of.
Fair enough, but I don't think that people using Google via a proxy would have such problems anyway.
Launching a Google domain that restricts information in any way isn't a step we took lightly. For several years, we've debated whether entering the Chinese market at this point in history could be consistent with our mission and values. Our executives have spent a lot of time in recent months talking with many people, ranging from those who applaud the Chinese government for its embrace of a market economy and its lifting of 400 million people out of poverty to those who disagree with many of the Chinese government's policies, but who wish the best for China and its people.
I love the self-serving first stance mentioned there--applauding the Chinese government "for its embrace of a market economy and its lifting of 400 million people out of poverty." That's a bit like applauding someone for beating his wife less severely than he used to (and taking him at his word as to the numbers).
No, we're not going to offer some Google products, such as Gmail or Blogger, on until we're comfortable that we can do so in a manner that respects our users' interests in the privacy of their personal communications.
So they won't be exposing dissidents like some have recently.

And yes, Chinese regulations will require us to remove some sensitive information from our search results. When we do so, we'll disclose this to users, just as we already do in those rare instances where we alter results in order to comply with local laws in France, Germany and the U.S.
Google does deserve a bit of credit for this. Users see a little notice in Chinese that says the results have been filtered. Maybe Chinese users will start to wonder why. What would be great is if it said how many results have been filtered out--"Your government's Ministry of Truth is keeping 1,203,089 sites full of information about Taiwan from you. Now go enjoy ''"
We're in this for the long haul. In the years to come, we'll be making significant and growing investments in China. Our launch of, though filtered, is a necessary first step toward achieving a productive presence in a rapidly changing country that will be one of the world's most important and dynamic for decades to come.
They say it's a necessary first step. Let's hope they don't mean it's their first compromise of many to come.

Thursday, January 26, 2006

Words which mean their own opposites, Part 1

According to A.C. Muller's extremely useful Dictionary of Chinese, Japanese, Korean, and Vietnamese, the character 汨 can mean:

  • A river in Hunan.
  • To manage, control.
  • To be in disorder.
  • The flowing through, or passing through (of water).
  • To float.
  • Floating and sinking.

  • I love Chinese.

    Monday, January 23, 2006

    Mandarin... or Math?

    The Chinese-learning craze continues, if we are to believe the flurry of news articles I'm getting every day. There's always something about some school somewhere in the State (or Thailand, or the UK) starting to offer Mandarin classes. There was a NY Times article a couple weeks ago about a Chinese government project to open a chain of Chinese-language schools called "Confucius Institute" around the world (the original is behind their Select Archive wall, but here's a copy from the SF Chronicle).

    One poster to a C-E translation-oriented mailing list recently said Chinese is just too hard for it to be a worthwhile venture for most students. He was attacked as having bought into the myth that "foreigners just can't learn Chinese," but I think he had a point. Well, it's not so difficult, but it's time-consuming. According to the Foreign Service Institute, it takes more than three times the instruction hours for a native English speaker to become fluent in Chinese, Japanese, Korean, or Arabic than German, French, or Spanish. (Not that becoming fluent is the only goal in learning a language--if it were, Japan's juku industry would be bankrupt!)

    In this opinion piece, Bloomberg columnist, Andy Mukherjee, reacts to reports that "the U.S. Senate Foreign Relations Committee is considering a proposal to allocate $1.3 billion to public schools":

    In 2004, Alan Greenspan, talked about math education's being a threat to U.S. competitiveness in a Senate Banking Committee hearing. The Federal Reserve chairman's concerns were validated in a Bloomberg News article last week about the Chartered Financial Analyst exams.

    Chinese students, the article said, had the highest pass rate in the world in last month's CFA Level I test, followed by Germany and India. The U.S. was fourth.

    Kindergarten students in Portland, Oregon, are learning that a triangle is "San-Jiao" in Mandarin, according to the Associated Press. They might learn something more useful by playing with an abacus.

    See the original AP story he's reacting to here.

    Saturday, January 21, 2006

    Wait Until He Reads Their Newspapers....

    Errors? Xinhua?

    A HOTEL manager has filed a suit against a local bookstore for selling him a copy of one of the country's most definitive dictionaries, which he claims is riddled with errors.

    Chen Dingxiang said he has found more than 3,000 mistakes in the latest version of the Xinhua Dictionary published by the Commercial Press.
    From Shanghai Daily.

    Saturday, January 14, 2006

    Congressional Kabuki

    In coverage of the Alito confirmation hearings, references to "Kabuki," "Kabuki dance," or "Kabuki theater" keep coming up. Even participants in the hearings like Senator Joe Biden made them.

    MSNBC commentator Flavia Monteiro Colgan writes: "Most of the public that could have been interested in weighing these issues had tuned out because of the air of inevitability that Democrats had fostered--or they were turned off by the Kabuki theater of the previous days." Commentator Carol Platt Liebau writes, "we’re treated not to a hearing, where issues and concerns will be thoroughly but impartially aired, but instead to a stylized kabuki ritual, where Judge Alito’s adversaries will attempt to draw blood...." ("Draw blood"? That's some ritual!)

    The Jurist has an article entitled "Of Kabuki Dances and Subtle Minuets."

    The cliched phrase is usually used to refer to highly regulated yet empty movements, though Salon bungles that somewhat. They've got: "The moribund hearings have been as predictable as a Kabuki drama." Predictable? Can you really imagine the Salon writer in a Kabuki theater slapping his forehead and saying, "Not this again!"?

    While in Japan, I never caught a Kabuki performance, but I did see a Noh play. I thought Kabuki was the highly-stylized and ritualistic Noh's wilder offspring. Maybe Noh is what these commentators really mean.

    Anyway, "Kabuki" or "political Kabuki" seems to be a meme that's gathering steam, as Language Hat predicted a while back. Here is an old blog entry on the topic from Semantic Composition which traces its usage.

    Friday, January 13, 2006

    Translator's Blues ran an article the other day entitled Translator's Blues, in which the author frets that someday his translating job will be taken over by a computer. He starts by citing the scepticism most of his human colleagues hold toward machine translation ("MT"):

    Anybody who's played around with translation software knows how bad the technology can be. Everyone in my office knows the hoary classic in which "The spirit is willing, but the flesh is weak," translated into Russian and back, comes out "The vodka is good, but the steak is lousy."
    He then goes on to express his alarm at learning MT is a multi-billion dollar industry, and cites some software he tested out which at least gave better results than Babelfish. But we're not quite there yet:
    The holy grail of MT is FAHQT: Fully Automatic, High-Quality Translation. For now, professional and amateur users content themselves with "gisting"—the practice of accepting 80 percent accuracy so as to get a general sense of a text's meaning. (Ninety percent accuracy leaves one error on every line.) Professionals who work with MT always do so in conjunction with human judgment, either by "pre-editing" to limit vocabulary or by "post-editing" to correct errors.
    For now, he concludes, his (and my) job is safe.

    Recently, someone on a translation-related mailing list pointed out the meaninglessness of expressing accuracy as a percentage--if a plural noun is translated as singular, is that the same percent wrong as it would have been had it been completely misrendered? What does "one mistake per line" mean?

    I am not worried about machine translation. On the contrary, I think it opens up a new job possibility--MT software operator. Software capable of providing the "FAHQT" mentioned in the article would have to be pretty close to artificial intelligence, or use some pretty clever semantic indexing. Take another example from the article: "The con is in the pen." Software could figure out that this "pen" is the one which is the short version of "penitentiary" rather than the writing implement by its proximity to other, related words in the document.

    That might lead to some pretty useful applications, but translation is still an art and not a science to me. I just came across a tough sentence in a job I am working on. The speaker grows coffee in Taiwan, and had a long period of hardship before becoming successful. He says:


    Literally, he says, "What was the eight-year war of resistance? I have gone through Wang Baochuan's difficult 18 years of keeping a cold oven." What would a machine do with that? A human has many options. If it were fiction, I might do something like "World War II was nothing--my struggle was more like the Hundred Years' War!" But these are the words of a real person and as such deserve a more faithful rendition that carry his way of expressing himself. But the reader needs the background behind the words. You could go with something like, "The eight-year War of Resistance [against Japan] was nothing--I went through 18 years of hardship like Wang Baochuan [the wife of a captured Tang Dynasty general]." and explain it in brackets, but that's not really elegant. You could even paraphrase it in third person, saying something like "He endured nearly two decades of struggle before finally tasting success."

    It's a matter of judging your audience and deciding the style of the piece. No MT software is going to be doing that any time soon.

    Tuesday, January 10, 2006

    Confucian Analects

    I have been reading through the Analects of Confucius recently, looking for good "commentary fights" where traditional commentators had widely differing interpretations of a passage. Though it's not readily apparent in the translations, you can sometimes figure out which commentators a given translator was relying on to crack this text. I just discovered a useful site for this, which has translations by Legge, Lau, and Couvreur (French), and the original Chinese to the side. Even better, when you roll the cursor over a Chinese character, the definition and pronunciation appear.

    Monday, January 09, 2006

    Namecards and Chinese Lessons

    I have been stuck in the Han Dynasty recently, reading Analects commentaries, but I popped out to take a look at, a blog run by a friend of No-Sword Matt. There is a beautiful post about Japanese namecards of all things. The author collects them--take a look before you cry "otaku!"

    There's also a post about ChinesePod, an interesting podcasting idea. You can download free Chinese-lesson podcasts, and for a subscription fee you can access some extra services. I am not sure exactly what you get other than PDFs of the script, but if anybody knows, please enlighten me.

    I listened to a couple of the podcasts--they are very professionally put together and entertaining. They feature a foreigner and a native speaker covering a point or two and bantering back and forth. The foreigner has a bit of an accent to his Chinese, but that's just my gaijin griping.

    Please drop a comment if you've listened to the ChinesePod podcasts and let me know what you think! I wish there were such a podcast for Japanese (preferably not made by anime freaks).

    Back to the Han Dynasty for me....

    Thursday, January 05, 2006


    As someone who travels around a lot (or at least used to), I always wanted a web-based word-processor. Now there is one called Writely, and it seems to work OK with Asian fonts. You can save and share your documents online--pretty nice.

    Wednesday, January 04, 2006

    Language Myths

    The LA Times has an article about something I've noticed recently--Mandarin is becoming more prevalent among Chinese in the USA. It contains some unfortunate myths and downright errors. The first is that Cantonese is "a sharp, cackling dialect full of slang and exaggerated expressions" which "can make words of love sound like a fight."

    To me, these sorts of statements are just cultural stereotypes--languages don't have sound, people do. A French truck driver cursing out somebody who just cut him off is not going to sound more melodious than a Cantonese speaker reading poetry. But that's just me. Some Cantonese speakers buy into this Cantonese=cacophony idea, too:

    "You might be saying, 'I love you' to your girlfriend in Cantonese, but it will still sound like you're fighting," said Howard Lee, a talk show host on Cantonese language KMRB-AM (1430). "It's just our tone. We always sound like we're in a shouting match. Mandarin is so mellow. Cantonese is strong and edgy."

    I have heard this from speakers of Cantonese and Taiwanese. Perhaps what it is is that Chinese bilinguals ("bidialecticals"?) often use Mandarin for school and "official" business but their own dialect for more earthy matters. I can say from experience that most Taiwanese will switch from Mandarin to Taiwanese when they want to curse you out, and most of the few Taiwanese phrases I know are "edgy" expressions for that very reason.

    The article also strangely conflates written Chinese and spoken Mandarin:
    To stress a point or to twist a sentence into a question, Cantonese speakers need only add a dramatic ahhhhhhh or laaaaaaa at the end.

    Something simple like, "Let's go" becomes "C'mon, lets get a move on!" when it's capped with laaaaa.

    By comparison, with Mandarin from China, what you see is what you get. The written form has been simplified by the Chinese government so that characters require fewer strokes. It is considered calmer and more melodic.
    Writing with fewer strokes makes the language more melodic?

    I would like to know why Mandarin is becoming more prevalent in the States. The article only says this:
    But over the last three decades, waves of Mandarin-speaking mainland Chinese and Taiwanese immigrants have diluted the influence of both the Cantonese language and the pioneering Cantonese families who ran Chinatowns for years.

    The surging Chinese economy today has challenged Cantonese further. Because Mandarin is China's official language, entrepreneurs [...] have been forced to adapt, often learning the hard way that business can't be done with Cantonese alone.
    The author says waves have been coming for three decades, but I think there's been a noticeable increase in Mandarin just over the last few years.

    Sunday, January 01, 2006

    Live from the Renegade Province

    It used to be that New Year's wasn't really a big deal in Taiwan, but it seems to be getting to be more and more of one all the time. Maybe it's the festivities and fireworks at the Taipei 101 building that are drawing more interest. But still, for Taiwanese, the real "new year" is the one on the lunar calendar. I was really surprised when I went to Japan and found that not only do they not celebrate the lunar new year, but that they have moved its celebrations over to the January 1st one. Going to a Buddhist temple on Jan. 1st just seemed so bizarre to me at first. But I got over it--I mean, apparently Jesus was actually born in April, but that doesn't stop us from celebrating Christmas on Dec. 25th (with Pagan trees for decorations).

    I am indulging in a bit of Taiwan nostalgia today by checking out the webcams on this page put up by the government.

    I discovered this right after who else pops up in my New York Times RSS feed but President Chen Shui-bian. In the article, the following appears:

    The speech was Mr. Chen's first major policy address since his Democratic Progressive Party fared badly in islandwide municipal elections on Dec. 3. His party favors greater political independence from the mainland.

    Until today, Mr. Chen had said fairly little in the weeks since the Dec. 3 islandwide municipal elections in which his Democratic Progressive Party, which seeks greater political independence from the mainland, fared badly.

    Is the NYT trying to hypnotize me? And what does it mean for Taiwan to seek "greater" political independence from "the mainland"? Seems pretty damn independent already, just no one can call it as it is because of the big C. "The Independence that Dare Not Speak its Name!"

    Cheers to Michael Turton for the link to the webcams.