Finding archetypal patterns for binary questionnaires

  1. Ismael Cabero 1
  2. Irene Epifanio 2
  1. 1 Universidad Internacional de La Rioja
    info

    Universidad Internacional de La Rioja

    Logroño, España

    ROR https://ror.org/029gnnp81

  2. 2 Universitat Jaume I
    info

    Universitat Jaume I

    Castelló de la Plana, España

    ROR https://ror.org/02ws1xc11

Revista:
Sort: Statistics and Operations Research Transactions

ISSN: 1696-2281

Año de publicación: 2020

Volumen: 44

Número: 1

Páginas: 39-66

Tipo: Artículo

DOI: 10.2436/20.8080.02.94 DIALNET GOOGLE SCHOLAR lock_openAcceso abierto editor

Otras publicaciones en: Sort: Statistics and Operations Research Transactions

Resumen

Archetypal analysis is an exploratory tool that explains a set of observations as mixtures of pure (extreme) patterns. If the patterns are actual observations of the sample, we refer to them as archetypoids. For the first time, we propose to use archetypoid analysis for binary observations. This tool can contribute to the understanding of a binary data set, as in the multivariate case. We illustrate the advantages of the proposed methodology in a simulation study and two applications, one exploring objects (rows) and the other exploring items (columns). One is related to determining student skill set profiles and the other to describing item response functions.

Información de financiación

This work is supported by the following grants: DPI2017-87333-R from the Spanish Ministry of Science, Innovation and Universities (AEI/FEDER, EU) and UJI-B2017-13 from Universitat Jaume I.

Referencias bibliográficas

  • Alcacer, A., Epifanio, I., Ibáñez, M. V., Simó, A. and Ballester, A. (2020). A data-driven classification of 3D foot types by archetypal shapes based on landmarks. PLOS ONE, 15, 1–19.
  • Cabero, I. and Epifanio, I. (2019). Archetypal analysis: an alternative to clustering for unsupervised texture segmentation. Image Analysis & Stereology, 38, 151–160.
  • Canhasi, E. and Kononenko, I. (2013). Multi-document summarization via archetypal analysis of the content-graph joint model. Knowledge and Information Systems, 1–22.
  • Canhasi, E. and Kononenko, I. (2014). Weighted archetypal analysis of the multi-element graph for queryfocused multi-document summarization. Expert Systems with Applications, 41, 535–543.
  • Chan, B., Mitchell, D. and Cram, L. (2003). Archetypal analysis of galaxy spectra. Monthly Notices of the Royal Astronomical Society, 338, 790–795.
  • Chiu, C.-Y., Douglas, J. A. and Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74, 633.
  • Cutler, A. and Breiman, L. (1994). Archetypal analysis. Technometrics, 36, 338–347.
  • Davis, T. and Love, B. (2010). Memory for category information is idealized through contrast with competing options. Psychological Science, 21, 234–242.
  • de Leeuw, J. and Mair, P. (2009). Gifi methods for optimal scaling in R: The package homals. Journal of Statistical Software, 31, 1–20.
  • Dean, N. and Nugent, R. (2013). Clustering student skill set profiles in a unit hypercube using mixtures of multivariate betas. Advances in Data Analysis and Classification, 7, 339–357.
  • D’Esposito, M. R., Palumbo, F. and Ragozini, G. (2012). Interval archetypes: a new tool for interval data analysis. Statistical Analysis and Data Mining, 5, 322–335.
  • Epifanio, I. (2013). H-plots for displaying nonmetric dissimilarity matrices. Statistical Analysis and Data Mining, 6, 136–143.
  • Epifanio, I. (2016). Functional archetype and archetypoid analysis. Computational Statistics & Data Analysis, 104, 24–34.
  • Epifanio, I., Ibáñez, M. V. and Simó, A. (2018). Archetypal shapes based on landmarks and extension to handle missing data. Advances in Data Analysis and Classification, 12, 705–735.
  • Epifanio, I., Ibáñez, M. V. and Simó, A. (2020). Archetypal analysis with missing data: see all samples by looking at a few based on extreme profiles. The American Statistician, 74, 169–183.
  • Epifanio, I., Vinué, G. and Alemany, S. (2013). Archetypal analysis: contributions for estimating boundary cases in multivariate accommodation problem. Computers & Industrial Engineering, 64, 757–765.
  • Eugster, M. J. and Leisch, F. (2009). From Spider-Man to Hero Archetypal Analysis in R. Journal of Statistical Software, 30, 1–23.
  • Eugster, M. J. A. (2012). Performance profiles based on archetypal athletes. International Journal of Performance Analysis in Sport, 12, 166–187.
  • Fernandez, M. and Barnard, A. S. (2015). Identification of nanoparticle prototypes and archetypes. ACS Nano, 9, 11980–11992.
  • Fletcher, R. (2000). Practical Methods of Optimization (Second ed.). John Wiley & Sons.
  • Flynt, A. and Dean, N. (2016). A survey of popular R packages for cluster analysis. Journal of Educational and Behavioral Statistics, 41, 205–225.
  • Friedman, J. H. and Tukey, J. W. (1974). A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers, C-23, 881–890.
  • Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27, 857–871.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning. Data mining, inference and prediction. 2nd ed., Springer-Verlag.
  • Henry, D., Dymnicki, A. B., Mohatt, N., Allen, J. and Kelly, J. G. (2015). Clustering methods with qualitative data: a mixed-methods approach for prevention research with small samples. Prevention Science, 16, 1007–1016.
  • Hinrich, J. L., Bardenfleth, S. E., Roge, R. E., Churchill, N. W., Madsen, K. H. and Mørup, M. (2016). Archetypal analysis for modeling multisubject fMRI data. IEEE Journal on Selected Topics in Signal Processing, 10, 1160–1171.
  • IBM Support (2016). Clustering binary data with K-Means (should be avoided). http://www-01.ibm.com/ support/docview.wss?uid=swg21477401. Accessed: 2018-07-09.
  • Jones, M. C. and Rice, J. A. (1992). Displaying the important features of large collections of similar curves. The American Statistician, 46, 140–145.
  • Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley.
  • Lawson, C. L. and Hanson, R. J. (1974). Solving Least Squares Problems. Prentice Hall.
  • Li, S., Wang, P., Louviere, J. and Carson, R. (2003). Archetypal Analysis: A NewWay To Segment Markets Based On Extreme Individuals. In ANZMAC 2003 Conference Proceedings, pp. 1674–1679.
  • Linzer, D. A. and Lewis, J. B. (2011). poLCA: An R package for polytomous variable latent class analysis. Journal of Statistical Software, 42, 1–29.
  • Lloyd, S. P. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28, 129–137.
  • Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M. and Hornik, K. (2018). Cluster: Cluster Analysis Basics and Extensions. R package version 2.0.7-1.
  • Makowski, D. (2016). Package ’neuropsychology’: An R Toolbox for Psychologists, Neuropsychologists and Neuroscientists. (0.5.0).
  • Mazza, A., Punzo, A. and McGuire, B. (2014). KernSmoothIRT: An R package for kernel smoothing in item response theory. Journal of Statistical Software, 58, 1–34.
  • Midgley, D. and Venaik, S. (2013). Marketing strategy in MNC subsidiaries: pure versus hybrid archetypes. In P. McDougall-Covin and T. Kiyak, Proceedings of the 55th Annual Meeting of the Academy of International Business, pp. 215–216.
  • Millán-Roures, L., Epifanio, I. and Martı́nez, V. (2018). Detection of anomalies in water networks by functional data analysis. Mathematical Problems in Engineering, 2018 (Article ID 5129735), 13.
  • Moliner, J. and Epifanio, I. (2019). Robust multivariate and functional archetypal analysis with application to financial time series analysis. Physica A: Statistical Mechanics and its Applications, 519, 195–208.
  • Mørup, M. and Hansen, L. K. (2012). Archetypal analysis for machine learning and data mining. Neurocomputing, 80, 54–63.
  • Orús, P. and Gregori, P. (2008). Fictitious Pupils and Implicative Analysis: a Case Study, pp. 321–345. Berlin, Heidelberg: Springer.
  • Pawlowsky-Glahn, V., Egozcue, J. J. and Tolosana-Delgado, R. (2015). Modeling and Analysis of Compositional Data. John Wiley & Sons.
  • Porzio, G. C., Ragozini, G. and Vistocco, D. (2008). On the use of archetypes as benchmarks. Applied Stochastic Models in Business and Industry, 24, 419–437.
  • R Development Core Team (2019). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.
  • Ragozini, G. and D’Esposito, M. R. (2015). Archetypal networks. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, New York, NY, USA, pp. 807–814. ACM.
  • Ragozini, G., Palumbo, F. and D’Esposito, M. R. (2017). Archetypal analysis for data-driven prototype identification. Statistical Analysis and Data Mining: The ASA Data Science Journal, 10, 6–20.
  • Ramsay, J. O. and Silverman, B. W. (2002). Applied Functional Data Analysis. Springer. http://www-01.ibm.com/support/docview.wss?uid=swg21477401
  • Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis (2nd ed.). Springer.
  • Ramsay, J. O. and Wiberg, M. (2017). A strategy for replacing sum scoring. Journal of Educational and Behavioral Statistics, 42, 282–307.
  • Rossi, N., Wang, X. and Ramsay, J. O. (2002). Nonparametric item response function estimates with the EM algorithm. Journal of Educational and Behavioral Statistics, 27, 291–317.
  • Seth, S. and Eugster, M. J. A. (2016a). Archetypal analysis for nominal observations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 849–861.
  • Seth, S. and Eugster, M. J. A. (2016b). Probabilistic archetypal analysis. Machine Learning, 102, 85–113.
  • Slater, S., Joksimović, S., Kovanovic, V., Baker, R. S. and Gasevic, D. (2017). Tools for educational data mining: A review. Journal of Educational and Behavioral Statistics, 42, 85–106.
  • Steinley, D. (2006). K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 1–34.
  • Steinschneider, S. and Lall, U. (2015). Daily precipitation and tropical moisture exports across the Eastern United States: An application of archetypal analysis to identify spatiotemporal structure. Journal of Climate, 28, 8585–8602.
  • Stone, E. and Cutler, A. (1996). Introduction to archetypal analysis of spatio-temporal dynamics. Physica D: Nonlinear Phenomena, 96, 110–131.
  • Su, Z., Hao, Z., Yuan, F., Chen, X. and Cao, Q. (2017). Spatiotemporal variability of extreme summer precipitation over the Yangtze river basin and the associations with climate patterns. Water, 9.
  • Theodosiou, T., Kazanidis, I., Valsamidis, S. and Kontogiannis, S. (2013). Courseware usage archetyping. In Proceedings of the 17th Panhellenic Conference on Informatics, PCI ’13, New York, NY, USA, pp. 243–249. ACM.
  • Thøgersen, J. C., Mørup, M., Damkiær, S., Molin, S. and Jelsbak, L. (2013). Archetypal analysis of diverse pseudomonas aeruginosa transcriptomes reveals adaptation in cystic fibrosis airways. BMC Bioinformatics, 14, 279.
  • Thurau, C., Kersting, K., Wahabzada, M. and Bauckhage, C. (2012). Descriptive matrix factorization for sustainability: Adopting the principle of opposites. Data Mining and Knowledge Discovery, 24, 325– 354.
  • Tsanousa, A., Laskaris, N. and Angelis, L. (2015). A novel single-trial methodology for studying brain response variability based on archetypal analysis. Expert Systems with Applications, 42, 8454–8462.
  • Unwin, A. (2010). Exploratory data analysis. In P. Peterson, E. Baker, and B. McGaw (Eds.), International Encyclopedia of Education (Third Edition), pp. 156–161. Oxford: Elsevier.
  • Vinué, G. (2017). Anthropometry: An R package for analysis of anthropometric data. Journal of Statistical Software, 77, 1–39.
  • Vinué, G. and Epifanio, I. (2017). Archetypoid analysis for sports analytics. Data Mining and Knowledge Discovery, 31, 1643–1677.
  • Vinue, G. and Epifanio, I. (2019). Adamethods: Archetypoid Algorithms and Anomaly Detection. R package version 1.2.
  • Vinué, G. and Epifanio, I. (2019). Forecasting basketball players’ performance using sparse functional data. Statistical Analysis and Data Mining: The ASA Data Science Journal, 12, 534–547.
  • Vinué, G., Epifanio, I. and Alemany, S. (2015). Archetypoids: A new approach to define representative archetypal data. Computational Statistics & Data Analysis, 87, 102–115.
  • Wu, C., Kamar, E. and Horvitz, E. (2016). Clustering for set partitioning with a case study in ridesharing. In IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), pp. 1384–1388.