The Relationship Between Dictionary Look-up Frequency and Corpus Frequency Revisited: A Log-File Analysis of a Decade of User Interaction with a Swahili-English Dictionary

Gilles-Maurice de Schryver, Sascha Wolfer, Robert Lew

Abstract


In an earlier publication it was claimed that there is no useful relationship between Swahili-English dictionary look-up frequencies and the occurrence frequencies for the same wordforms in Swahili-English corpora, at least not beyond the top few thousand wordforms. This result was challenged using data for German by a different team of researchers using an improved methodology. In the present article the original Swahili-English data is revisited, using ten years’ worth of it rather than just two, and using the improved methodology. We conclude that there is indeed a positive relationship. In addition, we show that online dictionary look-up behaviour is remarkably similar across languages, even when, as in our case, one is dealing with languages from very dissimilar language families. Furthermore, online dictionaries turn out to have minimum look-up success rates, below which they simply cannot go. These minima are language-sensitive and vary depending on the regularity of the searched-for entries, but are otherwise constant no matter the size of randomly sampled dictionaries. Corpus-informed sampling always improves on any random method. Lastly, from the point of view of the graphical user interface, we argue that the average user of an online bilingual dictionary is better served with a single search box, rather than separate search boxes for each dictionary side.


Keywords


lexicography; online dictionaries; log files; corpus frequencies; Swahili; English; language universals

Full Text:

PDF

References


Abate, F. (1985). Dictionaries past & future: Issues and prospects. Dictionaries: Journal of the Dictionary Society of North America. 7, 270–283.

Bergenholtz, H. & Johnson, M. (2005). Log files as a tool for improving internet dictionaries. Hermes. 34, 117–141.

Collins-Robert (2003). The Unabridged Collins-Robert Electronic French Dictionary (CD-ROM desktop dictionary, including the Collins-Robert Unabridged French Dictionary and the Collins-Robert Comprehensive French Dictionary). Paris: Dictionnaires Le Robert / VUEF.

Crystal, D. (1986). The ideal dictionary, lexicographer and user. In R. F. Ilson (Ed.), Lexicography: An emerging international profession (pp. 72–81). Manchester: Manchester University Press.

De Schryver, G.-M. (2003). Lexicographers' dreams in the electronic-dictionary age. International Journal of

Lexicography. 16(2), 143–199.

De Schryver, G.-M. (2018). Towards a new type of dictionary for Swahili. In J. Čibej, V. Gorjanc, I. Kosem & S. Krek (Eds.), Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, 17-21 July 2018, Ljubljana, Book of Abstracts (pp. 98–100). Ljubljana: Faculty of Arts, Ljubljana University Press.

De Schryver, G.-M. & Joffe, D. (2004). On how electronic dictionaries are really used. In G. Williams & S. Vessier (Eds.),

Proceedings of the Eleventh EURALEX International Congress, EURALEX 2004, Lorient, France, July 6-10, 2004 (pp. 187–196). Lorient: Faculté des Lettres et des Sciences Humaines, Université de Bretagne Sud.

De Schryver, G.-M., Joffe, D., Joffe, P. & Hillewaert, S. (2006). Do dictionary users really look up frequent words? – On the overestimation of the value of corpus-based lexicography. Lexikos. 16, 67–83.

Hillewaert, S. & De Schryver, G.-M. (2004). Online Kiswahili (Swahili) – English Dictionary. https://www.goswahili.org/dictionary/.

Kilgarriff, A., Rychlý, P., Smrž, P. & Tugwell, D. (2004). The Sketch Engine. In G. Williams & S. Vessier (Eds.), Proceedings of the Eleventh EURALEX International Congress, EURALEX 2004, Lorient, France, July 6-10, 2004 (pp. 105–116). Lorient: Faculté des Lettres et des Sciences Humaines, Université de Bretagne Sud.

Knowles, F. E. (1983). Towards the machine dictionary: 'mechanical' dictionaries. In R. R. K. Hartmann (Ed.), Lexicography: Principles and Practice (pp. 181–197). London: Academic Press.

Koplenig, A., Meyer, P. & Müller-Spitzer, C. (2014). Dictionary users do look up frequent words. A log file analysis. In C. Müller-Spitzer (Ed.), Using online dictionaries (pp. 229–249). Berlin: Walter de Gruyter.

Lemnitzer, L. (2001). Das Internet als Medium für die Wörterbuchbenutzungsforschung. In I. Lemberg, B. Schröder & A.

Storrer (Eds.), Chancen und Perspektiven computergestützter Lexikographie: Hypertext, Internet und SGML/XML für die Produktion und Publikation digitaler Wörterbücher (pp. 247–254). Tübingen: Niemeyer.

Lew, R. & De Schryver, G.-M. (2014). Dictionary users in the digital revolution. International Journal of Lexicography. 27(4), 341–359.

Lorentzen, H. & Theilgaard, L. (2012). Online dictionaries – how do users find them and what do they do once they have? In R. V. Fjeld & J. M. Torjusen (Eds.), Proceedings of the 15th EURALEX International Congress (pp. 654–660). Oslo: Department of Linguistics and Scandinavian Studies, University of Oslo.

Martin, W. (2011). Pharos Groot Woordeboek. Afrikaans en Nederlands (Prisma Groot Woordenboek Afrikaans en Nederlands). Cape Town: Pharos.

Martin, W. (2012a). Amalgamated bilingual dictionaries. In R. Genis, E. de Haard, J. Kalsbeek, E. Keizer & J. Stelleman (Eds.), Between West and East: Festschrift for Wim Honselaar, on the Occasion of his 65th Birthday (pp. 437–449). Amsterdam: Pegasus.

Martin, W. (2012b). ANNA: A dictionary with a name (and what lies behind it). Lexikos. 22, 406–426.

Martin, W. & Gouws, R. H. (2000). A new dictionary model for closely related languages: The Dutch–Afrikaans Dictionary Project as a case-in-point. In U. Heid, S. Evert, E. Lehmann & C. Rohrer (Eds.), Proceedings of the Ninth EURALEX International Congress, EURALEX 2000, Stuttgart, Germany (pp. 783–792). Stuttgart: Institut for maschinelle Sprachverarbeitung, Universität Stuttgart.

Mohamed, A. A. (2009). Kiswahili for Foreigners (3rd revised edition). Zanzibar: Goodluck Publishers.

Müller-Spitzer, C., Wolfer, S. & Koplenig, A. (2015). Observing online dictionary users: Studies using Wiktionary log files. International Journal of Lexicography. 28(1), 1–26.

Prinsloo, D. J. (2014). Lexicographic treatment of kinship terms in an English/Sepedi-Setswana-Sesotho dictionary with an amalgamated lemmalist. Lexikos. 24, 272–290.

Schoonheim, T., Tiberius, C., Niestadt, J. & Tempelaars, R. (2012). Dictionary use and language games: Getting to know

the dictionary as part of the game. In R. V. Fjeld & J. M. Torjusen (Eds.), Proceedings of the 15th EURALEX International Congress (pp. 974–979). Oslo: Department of Linguistics and Scandinavian Studies, University of Oslo.

Sinclair, J. (Ed.). (1987). Looking Up: An Account of the COBUILD Project in Lexical Computing and the Development of the Collins COBUILD English Language Dictionary. London: Collins ELT.

Tan, K. H. & Woods, P. C. (2008). Media-related or generic-related features in electronic dictionaries: learners' perception and preferences. GEMA Online® Journal of Language Studies. 8(2), 1–17.

Trap-Jensen, L. (2014). Korpus eller brugerne – hvem får det sidste ord? In M. H. Andersen, J. N. Jensen & P. Jarvad (Eds.), Neologismer. Dansk Sprognævns 2. seminar om nye ord. København 5.-6. november 2013 (pp. 129–144). Copenhagen: Dansk Sprognævn.

Trap-Jensen, L., Lorentzen, H. & Sørensen, N. H. (2014). An odd couple – Corpus frequency and look-up frequency: What relationship? Slovenščina 2.0. 2(2), 94–113.

Verlinde, S. & Binon, J. (2010). Monitoring dictionary use in the electronic age. In A. Dykstra & T. Schoonheim (Eds.), Proceedings of the XIV Euralex International Congress (pp. 1144–1151). Leeuwarden: Fryske Akademy.




DOI: http://dx.doi.org/10.17576/gema-2019-1904-01

Refbacks

  • There are currently no refbacks.


 

 

 

eISSN : 2550-2131

ISSN : 1675-8021