Vietnamese word list

As a service to the developer community I have compiled lists of the most common Vietnamese words. There are 4 lists of different sizes: a small list of about 11.000 words (Viet11K.txt), a middle-sized list of about 22.000 words (Viet22K.txt), a larger list which comprises more than 39.000 words and phrases (Viet39K.txt), and a large list with about 74.000 and phrases (Viet74K.txt). The word lists are stored in Unicode and TCVN3 (aka ABC) encoding and sorted in the standard dictionary order.

To download the word list, please click here. The files Viet11K.txt, Viet22K.txt and Viet39K.txt are in Unicode (UTF-8). The file words.txt has the same content as Viet39K.txt but in TCVN3 encoding. The word list is distributed under GNU General Public License. If you modify it, please send me the modifications.

If you are interested in building open source freeware for Vietnamese text processing (e.g., a free spell checker for Vietnamese), please contact me.

How to sort Vietnamese words

Ddo+n vi. mu.c tu+` la` tu+`, to^? ho+.p co^' ddi.nh tu+o+ng ddu+o+ng tu+`, mo^.t so^' tha`nh ngu+~, hi`nh vi., tu+` vie^'t ta('t va` con chu+~ - ki' hie^.u.

Ca'c ddo+n vi. mu.c tu+` ddu+o+.c xe^'p theo thu+' tu+. chu+~ ca'i:

a a( a^ b c d dd e e^ f g h i j k l m n o o^ o+ p q r s t u u+ v w x y z

va` theo thu+' tu+. da^'u gio.ng:

Ddo+n vi. dde^? xe^'p la` tu+`ng kho^'i vie^'t lie^`n, ddo+n tie^'t hoa(.c dda tie^'t, do ddo' a'c y' xe^'p tru+o+'c a'ch (vi` a'c xe^'p tru+o+'c a'ch), nhu+ng apatit xe^'p tru+o+'c apxe (vi` apa- xe^'p tru+o+'c apx-).