CEDICT

The CEDICT project was started by Paul Denisowski in 1997 and is maintained by a team on mdbg.net (a website registered for the purpose by Dutchman Dennis Vierkant) under the name CC-CEDICT, with the aim to provide a complete Chinese to English dictionary with pronunciation in pinyin for the Chinese characters.

Content

CEDICT is a text file; other programs (or simply Notepad or egrep or equivalent) are needed to search and display it. This project is considered a standard Chinese-English reference on the Internet and is used by several other Chinese-English projects. The Unihan Database uses CEDICT data for most of its information about character compounds, but this is auxiliary and is explicitly not a part of the main Unicode database [1].

Features:

Traditional Chinese and Simplified Chinese
Pinyin (several pronunciations)
American English (several)
As of 14 February 2016, it had 114,087 entries in UTF-8[2].

The basic format of a CEDICT entry is:

Traditional Simplified [pin1 yin1] /American English equivalent 1/equivalent 2/
漢字 汉字 [han4 zi4] /Chinese character/CL:個|个/

Example of a simple egrep search:

$ egrep -i 有勇無謀 cedict.txt
有勇無謀 有勇无谋 [you3 yong3 wu2 mou2] /bold but not very astute/

History

Year	Event
1991	EDICT Japanese dictionary project was started by Jim Breen.
1997	CEDICT project started by Paul Denisowski, on the model of EDICT. Continued by Erik Peterson.
2007	MDBG started a new project called CC-CEDICT which continues the CEDICT project with a new license: Creative Commons Attribution-Share Alike 3.0 License, allowing more projects to use it. Additionally a work flow has been set up to streamline the process of submitting, reviewing and processing new entries.

Related projects

CEDICT has shown the way to some other projects:

HanDeDict (~156,000 Chinese entries)
CFDICT (~44,000 entries) for French
Some older CEDICT data is also found in the Adsotrans dictionary.
February 2012: ChE-DICC, the Spanish-Chinese free dictionary starts (currently beta)
May 2017: CHDICT (11,000 entries) for Hungarian
CC-Canto is Pleco Software's addition of Cantonese language readings in Jyutping transcription to CC-CEDICT[3]
Cantonese CEDICT features Cantonese language readings in Yale transcription and has Cantonese-specific words, many of which were taken from "A Dictionary of Cantonese Slang"[4] in possible copyright infringement.[5]

gollark: Actually, it's good.

gollark: But it would be basically no effort to support MKV too, since WebM is just MKV but you can only use VP8/VP9/AV1 (and Opus/Vorbis).

gollark: My server only has 4GB of RAM, which is occasionally too little.

gollark: It's a codec, not a container!

gollark: Why would you use hevc as an *extension*? Ew.

References

"Unihan Database Lookup". unicode.org.
"MDBG English to Chinese dictionary". www.mdbg.net.
"CC-Canto - A Cantonese dictionary for everyone". cantonese.org.
http://writecantonese8.wordpress.com/2012/02/04/cantonese-cedict-project/ "Later, I was guided to merge data from Cantonese Stardict, which is an electronic version of “A Dictionary of Cantonese Slang”, into Cantonese CEDICT"
"StarDict". Stardict.sourceforge.net. Retrieved 18 November 2011.

External links

CC-CEDICT Editor Project home page
more information on the formatting of CC-CEDICT
MDBG free online Chinese–English dictionary uses CC-CEDICT, supports adding / editing entries and offers recent CC-CEDICT downloads.
Flashonary is a Chinese-English Dictionary with integrated flashcards that uses CC-CEDICT.
Example of CEDICT data for the han character " 中 ", use by Unihan (Section "Chinese Compounds")
Chinese Dictionaries Discussion group about Chinese->"foreign language" dictionaries
The homepage of Paul Denisowski, the founder of CEDICT
www.clearchinese.com uses CEDICT
Mandarin Text Project uses CEDICT
HanDeDict @ Zydeo: Open-source Chinese-German dictionary
CHDICT kínai-magyar szótár: Open-source Chinese-Hungarian dictionary
Zhonga Chinese-English dictionary with handwriting recognition and pronunciation, uses CEDICT.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] "Unihan Database Lookup". unicode.org.

[2] "MDBG English to Chinese dictionary". www.mdbg.net.

[3] "CC-Canto - A Cantonese dictionary for everyone". cantonese.org.

[4] ttp://writecantonese8.wordpress.com/2012/02/04/cantonese-cedict-project/ "Later, I was guided to merge data from Cantonese Stardict, which is an electronic version of “A Dictionary of Cantonese Slang”, into Cantonese CEDICT"

[5] "StarDict". Stardict.sourceforge.net. Retrieved 18 November 2011.