Where are the Keyboard Dictionaries in #Android?


I love the Froyo multiple languages keyboard feature! Its AWESOME, sliding a finger over the keyboard to change language .. Awesome!
If your specific dictionary is in there. Which it probably isnt, since there are only 6 dictionaries in the default keyboard. And Dutch isnt in there. Dutch is in several other keyboards, like the HTC one, but sadly there is no shared space where these dictionaries live, in the contrary, they all reside in the .apk that contains the keyboard.

By opening /system/app/LatinIME.apk (as found in CyanogenMod), we find out that the dictionaries are in the .apk under the res directory. While we’re there, someone mentioned the availability of more dicts might be a size issue, but I dont think so since they’re all quite moderate in size:

  • raw-de: 739K
  • raw-en: 822K
  • raw-es: 768K
  • raw-fr: 775K
  • raw-it: 688K
  • raw-sv: 911K

Also, the custom words are saved in a database at /data/data/com.android.inputmethod.latin/databases/auto_dict.db

Now, looking to add a dutch dictionary, I went looking in the AOSP. The LatinIME source is in ./packages/inputmethods/LatinIME and contains a ./packages/inputmethods/LatinIME/dictionaries directory. I expected to find the dictionary files there, but it only contains a sample.xml file. So no .xml dictionaries. The aforementioned res directory is at ./packages/inputmethods/LatinIME/java/res/ but contains none of the raw-lang directories.
The dictionaries do not appear to be part of the AOSP. I guess Google is not able to open source these?

While searching (the interwebs and IRC) I also discovered that a lot of other people (Issue 1827: add dictionaries for other locales (or make it easier for users to do so) – Im d.gen… in that thread) were looking to add their language to the code tree and that some people had solved the problem by just rolling a custom LatinIME (Softkeyboard). I dont like that option, however since I’d rather strengthen the default tree instead of splitting from it and updating the code after each AOSP update.

CyanogenMod however does have the dictionaries and so I checked out the CyanogenMod tree. it took me a while to find out where they were as I was expecting them to be in the paths I mentioned before, but no such luck. Apparently (and this makes sense,) the CM specific files are in the ./vendor/cyanogen directory, the binary dictionaries in

./vendor/cyanogen/overlay/common/packages/inputmethods/LatinIME/java/res/raw-sv/main.dict
./vendor/cyanogen/overlay/common/packages/inputmethods/LatinIME/java/res/raw-de/main.dict
./vendor/cyanogen/overlay/common/packages/inputmethods/LatinIME/java/res/raw-fr/main.dict
./vendor/cyanogen/overlay/common/packages/inputmethods/LatinIME/java/res/raw-en/main.dict
./vendor/cyanogen/overlay/common/packages/inputmethods/LatinIME/java/res/raw-it/main.dict
./vendor/cyanogen/overlay/common/packages/inputmethods/LatinIME/java/res/raw-es/main.dict

So. How to add new dictionaries to Android?

  • Either in AOSP under packages/inputmethods/LatinIME/dictionaries/ as .xml files (preferable),
  • under packages/inputmethods/LatinIME/java/res/ as binary files
  • or if that turns out to be impossible for some reason, adding them to CM in the respective vendor/cyanogen/overlay/common/packages/inputmethods/LatinIME/ directories (so that at least CM has the extended languages).

The ideal situation however would be to split the dictionaries from the keyboards and put them somewhere where any application might use them; making it possible to install new dictionary packs (eg DutchDictionary.apk) from the market, thus solving the whole dictionary problem. Maybe by adding them in /data/data/com.android.inputmethod.latin/databases/ although there is probably a performance reason for them not being there in the first place?
Anyhow, in the mean while, we do need the dictionaries! So lets not wait for this and add the data to AOSP now 🙂

Im wondering though why the Softkeyboard people dont add their dictionaries to AOSP.. I do see the benefit of keeping it available in the market as a separate app, ie making it available to every android user instead of those “few” tinkerers running the latest AOSP (or mods based on it.)

Published by Gert

I'm just this guy, you know..

Join the Conversation

17 Comments

  1. auch
    recompiling the LatinIME with raw-nl is apparently not enough.. the thing has the dict but doesn’t realize it. Ill have to look further into that I guess.. Nice try though 🙂 And at least I’m now using my home rolled, slightly larger (in file size) soft keyboard
    lol

  2. Though aptly naming the dictionary might do the trick. Must be getting tired.
    Recompiling..
    Pushing out
    Booting the phone
    aaand .. It works 🙂

  3. In your android travels, did you find where the AutoText dictionaries are stored? This would be what allows the keyboard to convert youre -> you’re, or brb -> be right back. And even better, how to add your own conversion definitions to this dictionary.

  4. Hi Fin,

    I didnt notice those, but thats merely because I really wasnt looking for them. I didnt have to check any code but Im sure that if you d read through it, it ll become clear quickly..
    And please post back, Im curious about it myself. Dont have the time to look for it now though..

  5. Just saw that you linked my XDA post here 🙂
    Your post is really informative. I was just coming back to check where to upload dict files in AOSP. I’ve put them .xml where you suggested
    https://review.source.android.com/#change,22384
    so, we’ll see.
    Also uploaded to CM:
    http://review.cyanogenmod.com/#change,4682
    Few days late for 7.0 final though :/
    It took few days to install VirtualBox with Ubuntu x64 under Win 7 x64, revive my Linux skills and figure out how to setup git, repo etc.

  6. Great! Do keep us posted!
    I did submit the dutch dict to Cyanogen and they did not include it due to the limited space available on the device. They did talk about a pluggable dictionary format that allows you to install a dict from the market. There was an early alfa back then but its apparently not being developed anymore :/ far as I know. (I posted a little about it on http://blog.cone.be/2010/08/20/creating-dutch-dictionary-for-android/ )
    Never got to committing it to AOSP..

    I did consider to pick up developement on the external dict code. I am still actively considering; but I dont have time to and I dont think I will have time anytime soon 🙁 Though the hardest work should be over with what is in the CM repos already..

  7. External dict code support would be nice. It would keep main app small, but would allow everyone to customize it to their needs.
    Regarding submitting process… I’m not sure how does review process should go on? Should I propose reviewers? Is there a way to speed it up?

  8. Hi Gert,
    I’m trying to add polish dictionary to CM7 keyboard and I’m currently stuck at same point that you were previously. When I added raw-pl and recompiled apk, keyboard works but still it didn’t found dictionary. Could you tell me where should I look?

  9. OK, nevermind that post, I’ve figured it down 😉
    I was using APK Manager from XDA Developers and I had to use keep option BUT delete resources.arsc from keep folder. After that built apk finally recognized added dictionary.

  10. Hi,

    I want configure hungarian dict but not work.
    In keyboard configuration not showing hungarian option.
    Can you help me why not showing?

  11. Hey Gert,

    You said you added the Dutch dict to Android, I presume CM7? Would you mind sharing said apk with me? I am not that great in all things android *yet*, and would like to add the dutch dict myself.

  12. Hi there,

    I want to use Burmese dictionary in My Burmese IME for android.
    I googled and found lot of stuff.But everything is scatterd.I looked at LatinIME Code.I was lost in LatinIME code.I am not getting Exactly How They are using Dictionary.Its make me feel that LatinIME uses some JNI to use Dictionary.Please guide me How should I go?

    Thanks.

  13. Hi Bhanushali,

    Im afraid I dont get what exactly you’re asking..

    A try though: The keyboard looks at what has been typed so far and looks at words that resemble it to offer corrections to typos & it looks at partial matches to guess what you might want to type; for example after typing “exa” , the dictionary might suggest “example”. These suggestions are influenced by the frequency of the words (its a weighed list) because you’re more likely to type a word that is typed often instead of a rare word

Leave a comment