Clustering words to match conditions: An algorithm for stimuli selection in factorial designs

Marc Guasch; Juan Haro; Roger Boada

Clustering words to match conditionsAn algorithm for stimuli selection in factorial designs

Marc Guasch
Juan Haro
Roger Boada

Revista:

Psicológica: Revista de metodología y psicología experimental

ISSN: 1576-8597

Año de publicación: 2017

Volumen: 38

Número: 1

Páginas: 111-131

Tipo: Artículo

DIALNET GOOGLE SCHOLAR Acceso abierto editor

Otras publicaciones en: Psicológica: Revista de metodología y psicología experimental

Resumen

With the increasing refinement of language processing models and the new discoveries about which variables can modulate these processes, stimuli selection for experiments with a factorial design is becoming a tough task. Selecting sets of words that differ in one variable, while matching these same words into dozens of other confounding variables is time consuming and error prone. To assist experimenters in this thankless task, we present a simple method to perform it with little effort. The method is based on Kmeans clustering as a way to detect small and tight clusters of words that match in the desired variables. We have formalized the procedure into an algorithmic format, that is, a series of easy-to-follow steps. In addition, we also provide an SPSS syntax that helps in choosing the correct size of the clustering. After reviewing the theory, we present a worked example that will guide the reader through the complete procedure. The dataset of the worked example is available as a supplementary material to this paper.

€ Ver financiación

Información de financiación

This research was funded by the Spanish Ministry of Economy and Competitiveness (PSI2015-63525-P) and by the Research Promotion Program of the Universitat Rovira i Virgili (2014PFR-URV-B2-37).

Financiadores

Universitat Rovira i Virgili Spain
- 2014PFR-URV-B2-37
- PSI2015-63525-P

Referencias bibliográficas

Armstrong, B. C., Watson, C. E., & Plaut, D. C. (2012). SOS! An algorithm and software for the stochastic optimization of stimuli. Behavior Research Methods, 44, 675–705. doi: 10.3758/s13428-011-0182-9.
Baayen, R. H. (2004). Statistics in psycholinguistics: A critique of some current gold standards. Mental Lexicon Working Papers, 1, 1–45.
Baayen, R. H. (2010). A real experiment is a factorial experiment? Mental Lexicon, 5, 149–157. doi: 10.1075/ml.5.1.06baa
Cohen, J. (1983). The cost of dichotomization. Applied Psychological Measurement, 7, 249–253. doi: 10.1177/014662168300700301.
Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology, 33, 497–505. doi: 10.1080/14640748108400805.
Cutler, A. (1981). Making up materials is a confounded nuisance, or: Will we be able to run any psycholinguistic experiments at all in 1990? Cognition, 10, 65–70. doi: 10.1016/0010-0277(81)90026-3.
Davis, C. J. (2005). N-Watch: A program for deriving neighborhood size and other psycholinguistic statistics. Behavior Research Methods, 37, 65–70. doi: 10.3758/BF03206399.
Davis, C. J., & Perea, M. (2005). BuscaPalabras: A program for deriving orthographic and phonological neighborhood statistics and other psycholinguistic indices in Spanish. Behavior Research Methods, 37, 665–671. doi: 10.3758/BF03192738.
Díez, E., Fernández, A., & Alonso, M. A. (2006). NIPE: Normas e índices de interés en Psicología Experimental. Retrieved from http://campus.usal.es/~gimc/nipe/
Duchon, A., Perea, M., Sebastián-Gallés, N., Martí, A., & Carreiras, M. (2013). EsPal: One-stop shopping for Spanish word properties. Behavior Research Methods, 45, 1246–1258. doi: 10.3758/s13428-013-0326-1.
Forster, K. I. (2000). The potential for experimenter bias effects in word recognition experiments. Memory & Cognition, 28, 1109–1115. doi: 10.3758/BF03211812.
Guasch, M., Boada, R., Ferré, P., & Sánchez-Casas, R. (2013). NIM: A Web-based Swiss Army knife to select stimuli for psycholinguistic studies. Behavior Research Methods, 45, 765–771. doi: 10.3758/s13428-012-0296-8.
Hillhouse, J. J., & Adler, C. M. (1997). Investigating stress effect patterns in hospital staff nurses: results of a cluster analysis. Social Science & Medicine, 45, 1781–1788. doi: 10.1016/S0277-9536(97)00109-3.
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666. doi: 10.1016/j.patrec.2009.09.011.
Lorr, M., & Strack, S. (1994). Personality profiles of police candidates. Journal of Clinical Psychology, 50, 200–207. doi: 10.1002/1097-4679(199403)50:2<200::AIDJCLP2270500208> 3.0.CO;2-1.
Rokach, L., & Maimon, O. (2005). Clustering methods. In O. Maimon & L. Rokach (Eds.), The data mining and knowledge discovery handbook (pp. 321–352). Boston, MA: Springer US. doi: 10.1007/0-387-25465-X_15.
Sparks, R. L., Patton, J., & Ganschow, L. (2012). Profiles of more and less successful L2 learners: A cluster analysis study. Learning and Individual Differences, 22, 463–472. doi: 10.1016/j.lindif.2012.03.009.
Troche, J., Crutch, S., & Reilly, J. (2014). Clustering, hierarchical organization, and the topography of abstract and concrete nouns. Frontiers in Psychology, 5. doi: 10.3389/fpsyg.2014.00360
Van Casteren, M., & Davis, M. H. (2007). Match: A program to assist in matching the conditions of factorial experiments. Behavior Research Methods, 39, 973–978. doi: 10.3758/BF03192992

Fuente de los datos: Dialnet