GSoC Update: Week #7 and #8

Balasankar C

Geek. Freedom. Privacy.

Last two weeks were seeing less coding and more polishing. I was fixing the LibIndic modules to utilize the concept of namespace packages (PEP 420) to obtain the libindic.module structure. In the stemmer module, I introduced the namespace package concept and it worked well. I also made the inflector a part of stemmer itself. Since inflector's functionality was heavily dependent on the output of the stemmer, it made more sense to make inflector a part of stemmer itself, rather than an individual package. Also, I made the inflector language-agnostic so that it will accept a language parameters as input during initialization and select the appropriate rules file.

In spellchecker also, I implemented the namespace concept and removed the bundled packages of stemmer and inflector. Other modifications were needed to make the tests run with this namespace concept, fixing coverage to pick the change etc. In the coding side, I added weights to the three metrics so as to generate suggestions more efficiently. I am thinking about formulating an algorithm to make comparison of suggestions and root words more efficient. Also, I may try handling spelling mistakes in the suffixes.

This week, I met with Hrishi and discussed about the project. He is yet to go through the algorithm and comment on that. However he made a suggestion to split out the languages to each file and make init.py more clean (just importing these split language files). He was ok with the work so far, as he tried out the web version of the stemmer.

[caption id="attachment_852" align="aligncenter" width="800"]

Hrishi testing out the spellchecker[/caption]