Investigating the use of readability metrics to detect differences in written productions of learners: a corpus-based studya corpus-based study

  1. Paula Lissón
Journal:
Bellaterra: journal of teaching and learning language and literature

ISSN: 2013-6196

Year of publication: 2017

Volume: 10

Issue: 4

Pages: 68-86

Type: Article

DOI: 10.5565/REV/JTL3.752 DIALNET GOOGLE SCHOLAR lock_openOpen access editor

More publications in: Bellaterra: journal of teaching and learning language and literature

Abstract

Este artículo trata sobre el uso de métricas de legibilidad como indicadores de las características lingüísticas propias a dos niveles de aprendices españoles de inglés L2. Presentamos y calculamos diecisiete medidas de legibilidad en 200 textos argumentativos extraídos del corpus NOCE (Díaz-Negrillo, 2007). Utilizamos SVM para averiguar qué métricas son capaces de detectar diferencias entre las 200 producciones, pertenecientes a alumnos de primer y segundo curso de Filología Inglesa, respectivamente. Las métricas basadas en la longitud de las frases, el número de frases y el número de palabras polisílabas son las que presentan mejores resultados.

Bibliographic References

  • Anderson, J. (1981). Analysing the readability of English and non-English texts in the classroom with Lix. Paper presented at the Seventh Meeting of the Australian Reading Association, Darwin, Australia.
  • Anderson, J. (1983). LIX and RIX: Variations on a little-known readability index. Journal of Reading, 26(6), 490–496.
  • Béchara, H., Costa, H., Taslimipoor, S., Gupta, R., Orasan, C., Pastor, G. C., & Mitkov, R. (2015). MiniExperts: An SVM approach for measuring semantic textual similarity. (pp. 96–101). Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015). Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/s15-2017
  • Björnsson, C. H. (1968). Läsbarhet. Liber.
  • Björnsson, C.-H. (1983). Readability of newspapers in 11 languages. Reading Research Quarterly, 480–497. DOI: https://doi.org/10.2307/747382
  • Bormuth, J. R. (1969). Development of readability analysis (Final report project no. 7-0052, contract no. OEC-3-7-070052-0326). US Department of Health, Education and Welfare. Retrieved from https://eric.ed.gov/?id=ED029166
  • Campbell, W. M., Campbell, J. P., Reynolds, D. A., Singer, E., & Torres-Carrasquillo, P. A. (2006). Support vector machines for speaker and language recognition. Computer Speech & Language, 20(2), 210–229. DOI: https://doi.org/10.1016/j.csl.2005.06.003
  • Caylor, J. S., Stitch, T. G., & Fox. (1973). Methodologies for determining reading requirements of military occupational specialties. (Technical Report No. 73-5). Human Resources Research Organization. Retrieved from https://eric.ed.gov/?id=ED074343
  • Chall, J. S., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Cambridge, MA: Brookline Books.
  • Coleman, M., & Liau, T. L. (1975). A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2), 283. DOI: https://doi.org/10.1037/h0076540
  • Collins-Thompson, K., & Callan, J. (2005). Predicting reading difficulty with statistical language models. Journal of the Association for Information Science and Technology, 56(13), 1448–1462. DOI: https://doi.org/10.1002/asi.20243
  • Dale, E. (1931). A comparison of two word lists. Educational Research Bulletin, 10(18), 484– 489.
  • Dale, E., & Chall, J. S. (1948). A formula for predicting readability: Instructions. Educational Research Bulletin, 27(2), 37–54.
  • Díaz-Negrillo, A. (2007). A fine-grained error tagger for learner corpora (Doctoral dissertation). University of Jaen, Jaen.
  • Díaz-Negrillo, A. (2009). EARS: A user’s manual (vol. 1). Munich: LINCOM Academic Reference Books.
  • DuBay, W. H. (2004). The principles of readability. Costa Mesa: Impact Information. Retrieved from http://www.impact-information.com/impactinfo/readability02.pdf
  • Farr, J. N., Jenkins, J. J., & Paterson, D. G. (1951). Simplification of Flesch reading ease formula. Journal of Applied Psychology, 35(5), 333–337. DOI: https://doi.org/10.1037/h0062427
  • Feng, L., Jansche, M., Huenerfauth, M., & Elhadad, N. (2010). A comparison of features for automatic readability assessment. Proceedings of the 23rd international conference on computational linguistics (pp. 276–284). Association for Computational Linguistics.
  • Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221– 233. DOI: https://doi.org/10.1037/h0057532
  • Flesch, R. F. (1949). Art of readable writing. USA: Hungry Minds Inc.
  • François, T. (2011). Les apports du traitement automatique des langues à la lisibilité du français langue étrangère. Université Catholique de Louvain, Louvain-La-Neuve.
  • Giménez, J., & Marquez, L. (2004). Fast and accurate part-of-speech tagging: The SVM approach revisited. Recent Advances in Natural Language Processing III, 153–162. DOI: https://doi.org/10.1075/cilt.260.17gim
  • Granger, S., Gilquin, G., & Meunier, F. (Eds.). (2015). The Cambridge handbook of learner corpus research. Cambridge University Press. DOI: https://doi.org/10.1017/cbo9781139649414
  • Gunning, R. (1952). The technique of clear writing. New York: McGraw-Hill.
  • Harris, A. J., & Jacobson, M. D. (1974). Revised Harris-Jacobson readability formulas. Paper presented at the Annual Meeting of the College Reading Association, Maryland. Retrieved from https://eric.ed.gov/?id=ED098536
  • Hawkins, J. A., & Buttery, P. (2010). Criterial features in learner corpora: Theory and illustrations. English Profile Journal, 1(1), 1–23. DOI: https://doi.org/10.1017/s2041536210000103
  • Hawkins, J. A., & Filipović, L. (2012). Criterial features in L2 English: Specifying the reference levels of the Common European Framework (Vol. 1). Cambridge: Cambridge University Press.
  • Heilman, M., Collins-Thompson, K., & Eskenazi, M. (2008). An analysis of statistical models and features for reading difficulty prediction. Proceedings of the third workshop on innovative use of NLP for building educational applications (pp. 71–79). Association for Computational Linguistics.
  • Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A practical guide to support vector classification. Retrieved from http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
  • Jarvis, S. (2011). Data mining with learner corpora: choosing classifiers for L1 detection. In F. Meunier, S. De Cock, G. Gilquin, & M. Paquot (Eds.), A taste for corpora: In honour of Sylviane Granger (pp. 127–154). Amsterdam/Philadelphia: John Benjamins Publishing Company. DOI: https://doi.org/10.1075/scl.45.10jar
  • Kincaid, J. P., Fishburne Jr, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. DTIC Document.
  • Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–496. DOI: https://doi.org/10.1075/ijcl.15.4.02lu
  • Lu, X. (2011). A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. Tesol Quarterly, 45(1) 36–62. DOI: https://doi.org/10.5054/tq.2011.240859
  • McLaughlin, G. H. (1969). SMOG grading-a new readability formula. Journal of Reading, 12(8), 639–646.
  • McLaughlin, G. H. (1968). Proposals for British readability measures. In J. Downing & A. L. Brown (Eds.), Third international reading symposium (pp. 186–205). London: Cassell.
  • Michalke, M. (2017). Package koRpus: An R Package for Text Analysis (Version 0.10-2). Retrieved from http://reaktanz.de/?c=hacking&s=koRpus
  • Mullen, T., & Collier, N. (2004). Sentiment analysis using support vector machines with diverse information sources. Proceedings of EMNLP-04, 9th conference on empirical methods in natural language processing (pp. 412–418). EMNLP.
  • Nivre, J., Hall, J., Nilsson, J., Eryiǧit, G., & Marinov, S. (2006). Labeled pseudo-projective dependency parsing with support vector machines. Proceedings of the tenth conference on computational natural language learning (pp. 221–225). Association for Computational Linguistics. DOI: https://doi.org/10.3115/1596276.1596318
  • O’hayre, J. (1966). Gobbledygook has gotta go. US Dept. of the Interior, Bureau of Land Management.
  • Pilán, I., Volodina, E., & Johansson, R. (2014). Rule-based and machine learning approaches for second language sentence-level readability. Proceedings of the ninth workshop on innovative use of NLP for building educational applications (pp. 174–184). Association for Computational Linguistics. DOI: https://doi.org/10.3115/v1/w14-1821
  • Prabowo, R., & Thelwall, M. (2009). Sentiment analysis: A combined approach. Journal of Informetrics, 3(2), 143–157. DOI: https://doi.org/10.1016/j.joi.2009.01.003
  • Schmid, H. (1995). Treetagger: A language independent part-of-speech tagger (computer software). Institut Für Maschinelle Sprachverarbeitung: Universität Stuttgart. Retrieved from http://www.ims.unistuttgart.de/forschung/ressourcen/werkzeuge/treetagger.en.html
  • Senter, R., & Smith, E. A. (1967). Automated readability index. DTIC Document.
  • Shen, W., Williams, J., Marius, T., & Salesky, E. (2013). A language-independent approach to automatic text difficulty assessment for second-language learners. Massachussets Inst. of Technolgy, Lexington Lincoln Lab. DOI: https://doi.org/10.21236/ada595522
  • Song, J., He, Y., & Fu, G. (2015). Polarity classification of short product reviews via multiple cluster-based SVM classifiers. Paper presented at the PACLIC, Shanghai. Retrieved from http://www.aclweb.org/anthology/Y15-2031
  • Spache, G. (1953). A new readability formula for primary-grade reading materials. The Elementary School Journal, 53(7), 410–413. DOI: https://doi.org/10.1086/458513
  • Sung, Y., Lin, W., Dyson, S. B., Chang, K., & Chen, Y. (2015). Leveling L2 texts through readability: Combining multilevel linguistic features with the CEFR. The Modern Language Journal, 99(2), 371–391. DOI: https://doi.org/10.1111/modl.12213
  • Sung, Y.-T., Chen, J.-L., Cha, J.-H., Tseng, H.-C., Chang, T.-H., & Chang, K.-E. (2015). Constructing and validating readability models: the method of integrating multilevel linguistic features with machine learning. Behavior Research Methods, 47(2), 340– 354. DOI: https://doi.org/10.3758/s13428-014-0459-x
  • Vajjala, S. (2018). Automated assessment of non-native learner essays: Investigating the role of linguistic features. International Journal of Artificial Intelligence in Education, 28(1), 79–105. DOI: https://doi.org/10.1007/s40593-017-0142-3
  • Vajjala, S., & Meurers, D. (2012). On improving the accuracy of readability classification using insights from second language acquisition. Proceedings of the seventh workshop on building educational applications using NLP (pp. 163–173). Association for Computational Linguistics.
  • Wheeler, L. R., & Smith, E. H. (1954). A practical readability formula for the classroom teacher in the primary grades. Elementary English, 31(7), 397–399.
  • Yamada, H., & Matsumoto, Y. (2003). Statistical dependency analysis with support vector machines. Proceedings of IWPT (Vol. 3, pp. 195–206).
  • Zalmout, N., Saddiki, H., & Habash, N. (2016). Analysis of foreign language teaching methods: An automatic readability approach. Proceedings of the 3rd workshop on natural language processing techniques for educational applications (122-130). Osaka.