Investigating the use of readability metrics to detect differences in written productions of learners: a corpus-based study: a corpus-based study

Paula Lissón

doi:10.5565/REV/JTL3.752

Investigating the use of readability metrics to detect differences in written productions of learners: a corpus-based studya corpus-based study

Paula Lissón

Aldizkaria:

Bellaterra: journal of teaching and learning language and literature

ISSN: 2013-6196

Argitalpen urtea: 2017

Alea: 10

Zenbakia: 4

Orrialdeak: 68-86

Mota: Artikulua

DOI: 10.5565/REV/JTL3.752 DIALNET GOOGLE SCHOLAR Sarbide irekia editor

Beste argitalpen batzuk: Bellaterra: journal of teaching and learning language and literature

Laburpena

Este artículo trata sobre el uso de métricas de legibilidad como indicadores de las características lingüísticas propias a dos niveles de aprendices españoles de inglés L2. Presentamos y calculamos diecisiete medidas de legibilidad en 200 textos argumentativos extraídos del corpus NOCE (Díaz-Negrillo, 2007). Utilizamos SVM para averiguar qué métricas son capaces de detectar diferencias entre las 200 producciones, pertenecientes a alumnos de primer y segundo curso de Filología Inglesa, respectivamente. Las métricas basadas en la longitud de las frases, el número de frases y el número de palabras polisílabas son las que presentan mejores resultados.

Erreferentzia bibliografikoak

Anderson, J. (1981). Analysing the readability of English and non-English texts in the classroom with Lix. Paper presented at the Seventh Meeting of the Australian Reading Association, Darwin, Australia.
Anderson, J. (1983). LIX and RIX: Variations on a little-known readability index. Journal of Reading, 26(6), 490–496.
Béchara, H., Costa, H., Taslimipoor, S., Gupta, R., Orasan, C., Pastor, G. C., & Mitkov, R. (2015). MiniExperts: An SVM approach for measuring semantic textual similarity. (pp. 96–101). Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015). Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/s15-2017
Björnsson, C. H. (1968). Läsbarhet. Liber.
Björnsson, C.-H. (1983). Readability of newspapers in 11 languages. Reading Research Quarterly, 480–497. DOI: https://doi.org/10.2307/747382
Bormuth, J. R. (1969). Development of readability analysis (Final report project no. 7-0052, contract no. OEC-3-7-070052-0326). US Department of Health, Education and Welfare. Retrieved from https://eric.ed.gov/?id=ED029166
Campbell, W. M., Campbell, J. P., Reynolds, D. A., Singer, E., & Torres-Carrasquillo, P. A. (2006). Support vector machines for speaker and language recognition. Computer Speech & Language, 20(2), 210–229. DOI: https://doi.org/10.1016/j.csl.2005.06.003
Caylor, J. S., Stitch, T. G., & Fox. (1973). Methodologies for determining reading requirements of military occupational specialties. (Technical Report No. 73-5). Human Resources Research Organization. Retrieved from https://eric.ed.gov/?id=ED074343
Chall, J. S., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Cambridge, MA: Brookline Books.
Coleman, M., & Liau, T. L. (1975). A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2), 283. DOI: https://doi.org/10.1037/h0076540
Collins-Thompson, K., & Callan, J. (2005). Predicting reading difficulty with statistical language models. Journal of the Association for Information Science and Technology, 56(13), 1448–1462. DOI: https://doi.org/10.1002/asi.20243
Dale, E. (1931). A comparison of two word lists. Educational Research Bulletin, 10(18), 484– 489.
Dale, E., & Chall, J. S. (1948). A formula for predicting readability: Instructions. Educational Research Bulletin, 27(2), 37–54.
Díaz-Negrillo, A. (2007). A fine-grained error tagger for learner corpora (Doctoral dissertation). University of Jaen, Jaen.
Díaz-Negrillo, A. (2009). EARS: A user’s manual (vol. 1). Munich: LINCOM Academic Reference Books.
DuBay, W. H. (2004). The principles of readability. Costa Mesa: Impact Information. Retrieved from http://www.impact-information.com/impactinfo/readability02.pdf
Farr, J. N., Jenkins, J. J., & Paterson, D. G. (1951). Simplification of Flesch reading ease formula. Journal of Applied Psychology, 35(5), 333–337. DOI: https://doi.org/10.1037/h0062427
Feng, L., Jansche, M., Huenerfauth, M., & Elhadad, N. (2010). A comparison of features for automatic readability assessment. Proceedings of the 23rd international conference on computational linguistics (pp. 276–284). Association for Computational Linguistics.
Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221– 233. DOI: https://doi.org/10.1037/h0057532
Flesch, R. F. (1949). Art of readable writing. USA: Hungry Minds Inc.
François, T. (2011). Les apports du traitement automatique des langues à la lisibilité du français langue étrangère. Université Catholique de Louvain, Louvain-La-Neuve.
Giménez, J., & Marquez, L. (2004). Fast and accurate part-of-speech tagging: The SVM approach revisited. Recent Advances in Natural Language Processing III, 153–162. DOI: https://doi.org/10.1075/cilt.260.17gim
Granger, S., Gilquin, G., & Meunier, F. (Eds.). (2015). The Cambridge handbook of learner corpus research. Cambridge University Press. DOI: https://doi.org/10.1017/cbo9781139649414
Gunning, R. (1952). The technique of clear writing. New York: McGraw-Hill.
Harris, A. J., & Jacobson, M. D. (1974). Revised Harris-Jacobson readability formulas. Paper presented at the Annual Meeting of the College Reading Association, Maryland. Retrieved from https://eric.ed.gov/?id=ED098536
Hawkins, J. A., & Buttery, P. (2010). Criterial features in learner corpora: Theory and illustrations. English Profile Journal, 1(1), 1–23. DOI: https://doi.org/10.1017/s2041536210000103
Hawkins, J. A., & FilipoviÄ, L. (2012). Criterial features in L2 English: Specifying the reference levels of the Common European Framework (Vol. 1). Cambridge: Cambridge University Press.
Heilman, M., Collins-Thompson, K., & Eskenazi, M. (2008). An analysis of statistical models and features for reading difficulty prediction. Proceedings of the third workshop on innovative use of NLP for building educational applications (pp. 71–79). Association for Computational Linguistics.
Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A practical guide to support vector classification. Retrieved from http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
Jarvis, S. (2011). Data mining with learner corpora: choosing classifiers for L1 detection. In F. Meunier, S. De Cock, G. Gilquin, & M. Paquot (Eds.), A taste for corpora: In honour of Sylviane Granger (pp. 127–154). Amsterdam/Philadelphia: John Benjamins Publishing Company. DOI: https://doi.org/10.1075/scl.45.10jar
Kincaid, J. P., Fishburne Jr, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. DTIC Document.
Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–496. DOI: https://doi.org/10.1075/ijcl.15.4.02lu
Lu, X. (2011). A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. Tesol Quarterly, 45(1) 36–62. DOI: https://doi.org/10.5054/tq.2011.240859
McLaughlin, G. H. (1969). SMOG grading-a new readability formula. Journal of Reading, 12(8), 639–646.
McLaughlin, G. H. (1968). Proposals for British readability measures. In J. Downing & A. L. Brown (Eds.), Third international reading symposium (pp. 186–205). London: Cassell.
Michalke, M. (2017). Package koRpus: An R Package for Text Analysis (Version 0.10-2). Retrieved from http://reaktanz.de/?c=hacking&s=koRpus
Mullen, T., & Collier, N. (2004). Sentiment analysis using support vector machines with diverse information sources. Proceedings of EMNLP-04, 9th conference on empirical methods in natural language processing (pp. 412–418). EMNLP.
Nivre, J., Hall, J., Nilsson, J., EryiÇ§it, G., & Marinov, S. (2006). Labeled pseudo-projective dependency parsing with support vector machines. Proceedings of the tenth conference on computational natural language learning (pp. 221–225). Association for Computational Linguistics. DOI: https://doi.org/10.3115/1596276.1596318
O’hayre, J. (1966). Gobbledygook has gotta go. US Dept. of the Interior, Bureau of Land Management.
Pilán, I., Volodina, E., & Johansson, R. (2014). Rule-based and machine learning approaches for second language sentence-level readability. Proceedings of the ninth workshop on innovative use of NLP for building educational applications (pp. 174–184). Association for Computational Linguistics. DOI: https://doi.org/10.3115/v1/w14-1821
Prabowo, R., & Thelwall, M. (2009). Sentiment analysis: A combined approach. Journal of Informetrics, 3(2), 143–157. DOI: https://doi.org/10.1016/j.joi.2009.01.003
Schmid, H. (1995). Treetagger: A language independent part-of-speech tagger (computer software). Institut Für Maschinelle Sprachverarbeitung: Universität Stuttgart. Retrieved from http://www.ims.unistuttgart.de/forschung/ressourcen/werkzeuge/treetagger.en.html
Senter, R., & Smith, E. A. (1967). Automated readability index. DTIC Document.
Shen, W., Williams, J., Marius, T., & Salesky, E. (2013). A language-independent approach to automatic text difficulty assessment for second-language learners. Massachussets Inst. of Technolgy, Lexington Lincoln Lab. DOI: https://doi.org/10.21236/ada595522
Song, J., He, Y., & Fu, G. (2015). Polarity classification of short product reviews via multiple cluster-based SVM classifiers. Paper presented at the PACLIC, Shanghai. Retrieved from http://www.aclweb.org/anthology/Y15-2031
Spache, G. (1953). A new readability formula for primary-grade reading materials. The Elementary School Journal, 53(7), 410–413. DOI: https://doi.org/10.1086/458513
Sung, Y., Lin, W., Dyson, S. B., Chang, K., & Chen, Y. (2015). Leveling L2 texts through readability: Combining multilevel linguistic features with the CEFR. The Modern Language Journal, 99(2), 371–391. DOI: https://doi.org/10.1111/modl.12213
Sung, Y.-T., Chen, J.-L., Cha, J.-H., Tseng, H.-C., Chang, T.-H., & Chang, K.-E. (2015). Constructing and validating readability models: the method of integrating multilevel linguistic features with machine learning. Behavior Research Methods, 47(2), 340– 354. DOI: https://doi.org/10.3758/s13428-014-0459-x
Vajjala, S. (2018). Automated assessment of non-native learner essays: Investigating the role of linguistic features. International Journal of Artificial Intelligence in Education, 28(1), 79–105. DOI: https://doi.org/10.1007/s40593-017-0142-3
Vajjala, S., & Meurers, D. (2012). On improving the accuracy of readability classification using insights from second language acquisition. Proceedings of the seventh workshop on building educational applications using NLP (pp. 163–173). Association for Computational Linguistics.
Wheeler, L. R., & Smith, E. H. (1954). A practical readability formula for the classroom teacher in the primary grades. Elementary English, 31(7), 397–399.
Yamada, H., & Matsumoto, Y. (2003). Statistical dependency analysis with support vector machines. Proceedings of IWPT (Vol. 3, pp. 195–206).
Zalmout, N., Saddiki, H., & Habash, N. (2016). Analysis of foreign language teaching methods: An automatic readability approach. Proceedings of the 3rd workshop on natural language processing techniques for educational applications (122-130). Osaka.

Datuen iturria: Dialnet