Lexical Scoring System of Lexical Chain for Quranic Document Retrieval

Hamed Zakeri Rad, Sabrina Tiun, Saidah Saad

Abstract


An Information Retrieval (IR) system aims to extract information based on a query made by a user on a particular subject from an extensive collection of text. IR is a process through which information is retrieved by submitting a query by a user in the form of keywords or to match words. In the Al-Quran, verses of the same or comparable topics are scattered throughout the text in different chapters, and it is therefore difficult for users to remember the many keywords of the verses. Therefore, in such situations, retrieving information using semantically related words is useful. In well-composed documents, the semantic integrity of the text (coherence) exists between the words. Lexical cohesion is the results of chains of related words that contribute to the continuity of the lexical meaning found within the text are a direct result of text being about the same thing (i.e. topic, etc.). This indicates that using an IR system and lexical chains are a useful and appropriate method for representing documents with concepts rather than using terms in order to have successful retrieval based on semantic relations. Therefore, a new Lexical Scoring System is proposed in this study, in addition to determining the semantic relation that exists between words whereby WordNet was used as the semantic knowledge base. The proposed scoring system helped to retrieve 86.58% of the total relevant documents in the Al-Quran based on the relevance judgment, using the lexical chain approach. Based on the findings, the study concludes that, the proposed approach on representing verses using lexical chains is appropriate and suitable for a Quranic IR system.

 


Keywords


lexical chain; information retrieval (IR); semantic retrieval; lexical scoring system; Quranic semantic retrieval system

Full Text:

PDF

References


Abdelnasser, H., Ragab, M., Mohamed, R., Mohamed, A., Farouk, B., El-Makky, N. & Torki, M. (2014). Al-Bayan: An Arabic Question Answering System for the Holy Quran. Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), 25–29 October, Doha, Qatar: 57-64.

Abdul-Ghafour, A. K. M., Norsimah Mat Awal., Intan Safinaz Zainudin & Ashinida Aladdin. (2017). Meanings of Near-Synonyms and Their Translation Issues in the Holy Qur'ān. GEMA Online® Journal of Language Studies. Vol. 17(4), 258-273.

Alqahtani, M. & Atwell, E. (2017). Evaluation Criteria for Computational Quran Search. International Journal on Islamic Applications in Computer Science And Technology Vol. 5(1), 12-22.

Alrehaili, S. M. & Atwell, E. (2014). Computational Ontologies for Semantic Tagging of the Quran: A Survey of Past Approaches. LREC 2014 Proceedings, 26-31 May, Reykjavik, Iceland:19-23.

Alsmadi, I. & Zarour, M. (2017). Online Integrity and Authentication Checking for Quran Electronic Versions. Applied Computing and Informatics. Vol. 13(1), 38-46.

Ayed, M. a. H. & Atwell, E. (2017). Quran Question Answering System Using Arabic Number Patterns (Singular, Dual, Plural). International Journal on Islamic Applications in Computer Science and Technology (IJASAT). Vol. 5(2), 1-12.

Barzilay, R. & Elhadad, M. (1999). Using Lexical Chains for Text Summarization. In I. Mani and M. T. Maybury, (Eds.). Advances in Automatic Text Summarization (pp. 111-121), Cambridge: The MIT Press.

Bautista-Gomez, L., Benoit, A., Cavelan, A., Raina, S. K., Robert, Y. & Sun, H. (2016). Coping with Recall and Precision of Soft Error Detectors. Journal of Parallel and Distributed Computing, Vol. 9(8), 8-24.

Belal, M. H. A. (2001). An Arabic Stemming Algorithm Based on

Extensive Rules Application (Area) for Information Retrieval: Its Development and Performance Measures. Unpublished PhD thesis, University Kebangsaan Malaysia, Malaysia.

Berry, D. M., Ferrari, A. & Gnesi, S. (2017). Assessing Tools for Defect Detection in Natural Language Requirements: Recall Vs Precision. Retrieved 18 May, 2018 from

https://pdfs.semanticscholar.org/bb70/310c2fad1648cdc31e9799733a52f6711311.pdf

Eljazzar, M. M., Hassan, A. & Alsharkawy, A. A. (2017). Towards a Time Based Video Search Engine for Al Quran Interpretation. Computer Science. Retrieved 18 May, 2018 from https://arxiv.org/ftp/arxiv/papers/1701/1701.09138.pdf.

Enss, M. J. R. (2006). An Investigation of Word Sense Disambiguation for Improving Lexical Chaining. Unpublished master thesis, University of Waterloo, Canada.

Fatimah Ahmad. (1995). A Malay Language Document Retrieval System: An Experimental Approach and Analysis. Unpublished PhD thesis, University Kebangsaan Malaysia, Malaysia.

Halliday, M. A. K. & Hasan, R. (2014). Cohesion in English, New York: Routledge.

Hamed, S. K. & Mohd Juzaiddin Ab Aziz. (2016). A Question Answering System on Holy Quran Translation Based on Question Expansion Technique and Neural Network Classification. Journal of Computer

Sciences. Vol. 12(3), 169-177.

Hirst, G. & St-Onge, D. (1995). Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms. In C. Fellbaum (Ed.), WordNet: An Electronic Lexical Database (pp. 305-332), : MIT Press.

Hirst, G. & St-Onge, D. (1998). Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms. WordNet: An Electronic Lexical Database. Vol. 3(5), 305-332.

Iqbal, R., Mustapha, A. & Mohd. Yusoff, Z. (2013). An Experience of Developing Quran Ontology with Contextual Information Support. Multicultural Education & Technology Journal. Vol. 7(4), 333-343.

Jarmasz, M. & Szpakowicz, S. (2001). Roget's Thesaurus: A Lexical Resource to Treasure. Proceedings of NAACL Workshop, 3-4 June, Pittsburgh: 186 - 188.

Khan, S. Z., Rahman, M. M., Sadi, A. S., Anwar, T., Mohammed, S. & Chowdhury, S. (2017). The Quranic Nature Ontology: From Sparql Endpoint to Java Application and Reasoning. International Journal of Innovative Computing. Vol. 7(2), 13-20.

Manning, C. D., Raghavan, P. & Schütze, H. (2009). An Introduction to Information Retrieval. England: Cambridge University Press.

Miller, G. & Fellbaum, C. (1998). Wordnet: An Electronic Lexical Database, Cambridge: MIT Press.

Mohamed, O. J. & Sabrina Tiun. (2015). Word Sense Disambiguation based on Yarowsky Approach in English Quranic Information Retrieval System. Journal of Theoretical and Applied Information Technology. Vol. 82(1), 163-171.

Morris, J. & Hirst, G. (1991). Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text. Computational Linguistics. Vol. 17(1), 21-48.

Powers, D. M. (2003). Recall & Precision Versus the Bookmaker. International Conference on Cognitive Science, 13-17 July, Sydney, Australia: 539-534.

Roget, P. M. (1977). Roget’s International Thesaurus. New York: HarperCollins Publishers.

Ruas, T. & Grosky, W. (2017). Exploring and Expanding the Use of Lexical Chains in Information Retrieval (Technical Report). Retrieved 18 May, 2018 from

https://deepblue.lib.umich.edu/bitstream/handle/2027.42/136659/Lexical

ChainsReport.pdf.

Sabrina Tiun, Zakr, H., Masnizah Mohd, Norazlinda Zainal Abidin & Ahmad Irfan Ikmal Hisham. (2013). Word Sense Disambiguation on English Quranic IR System. Proceedings of Taibah University International Conference on Advances in Information Technology for Holy Quran and Its Science (NOORIC 1435/2013), 19-22 December 2013, Madinah, Saudi Arabia: 214-217.

Shoaib, M., Yasin, M. N., Hikmat, U. K., Saeed, M. I. & Khiyal, M. S. H. (2009). Relational Wordnet Model for Semantic Search in Holy Quran. International Conference on Emerging Technologies, 19-20 October, Islamabad, Pakistan:29-34.

Silber, H. G. & Mccoy, K. F.(2000). An Efficient Text Summarizer Using Lexical Chains. Proceedings of the first international conference on Natural language Generation, 12-16 June, Mitzpe Ramon, Israel: 268-271.

Silber, H. G. & Mccoy, K. F. (2002). Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization. Computational Linguistics. Vol. 28(4), 487-496.

Siti Nor Fatimah Haris. & Melord Md Yunus. (2014). The Use of Lexical Cohesion among TESL Post Graduate Students in Academic Writing. Journal of Education and Human Development. Vol. 3(2), 847-869.

Ta'a, A., Abed, Q. & Ahmad, M. (2017). Al-Quran Ontology Based on Knowledge Themes. Journal of Fundamental and Applied Sciences. Vol. 9(5), 800-817.

Walters, W. H. (2009). Google Scholar Search Performance: Comparative Recall and Precision. portal: Libraries and the Academy. Vol. 9(1), 5-24.

Yauri, A. R., Kadir, R. A., Azman, A. & Murad, M. A. (2013). Quranic Verse Extraction Base on Concepts Using Owl-Dl Ontology. Research Journal of Applied Sciences, Engineering and Technology. Vol. 6(23), 4492-4498.

Yunus, M. a. M., Mustapha, A. & Samsudin, N. A. (2017). Analysis of Translated Query in Quranic Malay and English Translation Documents with Stemmer. MATEC Web of Conferences. Vol. 135, 00069.

Zakariah, M., Khan, M. K., Tayan, O. & Salah, K. (2017). Digital Quran Computing: Review, Classification, and Trend Analysis. Arabian Journal for Science and Engineering. Vol. 42(8), 3077-3102.




DOI: http://dx.doi.org/10.17576/gema-2018-1802-05

Refbacks

  • There are currently no refbacks.


 

 

 

eISSN : 2550-2131

ISSN : 1675-8021