Evolutionary design of deep neural networks

  1. Baldominos Gómez, Alejandro
Dirigida per:
  1. Pedro Isasi Viñuela Director/a
  2. Yago Sáez Achaerandio Codirector/a

Universitat de defensa: Universidad Carlos III de Madrid

Fecha de defensa: 05 de de novembre de 2018

Tribunal:
  1. María Araceli Sanchís de Miguel President/a
  2. Francisco Javier Segovia Pérez Secretari/ària
  3. Simon Lucas Vocal

Tipus: Tesi

Resum

This thesis involves a research work in computer science, which is by itself a very broad field encompassing all aspects regarding the study of computers, including but not constrained to hardware design, networking, software engineering, computer security, theory of computation, etc. To be more specific, when placing this work within an area of knowledge we will state that this thesis involves research in the field of artificial intelligence (AI). Despite being much more specific than computer science, the field of artificial intelligence is still rather big. However, in this thesis we will put the focus in two main subfields, namely: machine learning and search and optimization. In a broad sense, machine learning (ML) is a field of study that aims at making computers learn. While machine learning comprises many different types of problems, in this work we will focus in supervised learning, representation learning and deep learning: • Supervised learning: this problem involves, given a set of data composed of features and a label, learning a model from the features in order to be able to guess the label. The fact that input data is labelled is why this problem is known as “supervised” learning. • Representation learning: in some machine learning problems, features may be known in advance or be trivial to extract from the data. However, in some cases, extracting features from the data might be a challenging task, especially when raw data is available in complex formats (multidimensional waveforms, images, video, etc). This process is called “feature engineering”, and can be performed manually. Representation learning is a set of machine learning techniques aiming at extracting a convenient set of features given a raw input, so they can be effectively exploited in further machine learning tasks (e.g., supervised learning). This discipline is also known as “feature learning”. • Deep learning: while there is no simple definition for this relatively new concept, we could state that it comprises a set of techniques which are composed of a sequence of several layers in order to extract features from raw data and to learn a model from such features. The relationship between these three fields is as follows: supervised learning is a type of problem which can be solved by many different types of techniques, and which requires a set of features for the learning phase. These features can be extracted in different ways, one of them being representation learning. Of course, representation learning can also be used for automatic feature engineering for problems different than supervised learning. Deep learning can be applied for supervised learning or for other types of problems, and often involves representation learning in order to automatically extract features from raw data. However, not all representation learning techniques involve deep learning. The intersection of these three disciplines conforms the boundaries in which this work takes place. In particular, one of the most representative techniques of deep learning are “deep neural networks”. This term is often used to refer to neural networks with more than one hidden layer (as opposed to "shallow" neural networks which would only contain one hidden layer). Another relevant concept for this work is convolutional neural networks (CNNs), which include a series of convolutional layers for representation learning and subsequent dense or recurrent layers for concept learning. In the last years, convolutional neural networks have been applied to very diverse problems, including character recognition (OCR), image classification, natural language processing, signal processing and classification, etc. Besides machine learning, this work also involves search and optimization, another important discipline within the field of artificial intelligence. In search problems, we want to find a goal state within a large set of states (search space), and to do so, we are given a start state and a transition function to move across states. Many search algorithms are based on heuristics, which are domain-specific functions guiding the search. Optimization problems are a special case of search problems, in which a cost function is computed from a set of values, and the objective is to find the optimal set of values in order to minimize the cost function. It is worth noting that the cost function can be stochastic, non-trivial or even unknown (a black box). Some search and optimization techniques are known as metaheuristics. These algorithms are heuristic-guided, but the heuristic function is domain-independent; instead, they make very few assumptions about this function. metaheuristics are suitable for combinatorial optimization when a specific heuristic is not available, and they will often find suboptimal solutions. Some of these techniques are known as biologically-inspired (or nature-inspired) metaheuristics, because they rely on some biological foundations in order to drive the optimization process. A set of techniques located within biologically-inspired metaheuristics, and whose governing principles are based on Darwin's theory of evolution and natural selection, is known as evolutionary computation. Examples of evolutionary computation techniques include, but are not limited to, genetic algorithms, evolutionary strategies, genetic programming or grammatical evolution. Artificial neural networks have been used for decades in order to tackle supervised learning problems. One of the neural network models most widely used for classification and regression is the multilayer perceptron, which is an implementation of a feed-forward network. A multilayer perceptron is characterized by a topology (also called architecture) and weights. The topology comprises one or more hidden layers, each of which will contain a number of hidden neurons. Neurons from one layer are connected to all neurons in the following layers, and each of these connections is assigned a weight. Also, neurons have another input parameter called bias, which is a special kind of weight not connected to any neuron from the previous layer. Popularity of artificial neural networks, and of the multilayer perceptron in particular, grew with the discovery of the backpropagation process, first published in Nature in 1986. This process uses gradient descent in order to optimize the weights of the neural network. Since then, several optimization functions have arisen in order to automatically search for the optimal configuration of weights and biases in order to minimize the classification or regression error of the neural network. However, regarding the topology, there is not a rule-of-thumb for determining the most suitable architecture for a given problem, and estimating it often involves trial-and-error; i.e., manually evaluating the performance of different architectures in order to determine the best one. In the latest years, convolutional neural networks (CNNs), which lie within the field of deep learning, have been extensively used to solve a large variety of problems (e.g., computer vision, natural language processing, signal classification, etc). The main advantage of convolutional neural networks is that they contain some convolutional layers which are able to perform representation learning; i.e., to automatically learn features from raw data. On the other hand, convolutional neural networks involve a more complex topology than those of traditional neural networks, and thus require more hyperparameters to be specified, to mention a few: number of convolutional layers, number of kernels (or patches) per convolutional layer, size of the convolutional patch, activation function, pooling, padding, etc. Finally, the topology gets even more complicated if we consider that the neural network can have recurrent layers, where connections between neurons not only happen from one layer to the following one, but also to the same layer. This solution is an interesting approach when tackling the classification of time series or other information with a temporal component (handwritten recognition, speech recognition, etc). There are different types of implementations for recurrent topologies, and the decision on which to use adds further hyperparameters to the network architecture. Determining the optimal topology of a convolutional neural network for a given problem is a hard task. Also, in some domains some topologies may not work at all, either because they are too simple, too complex, or they do not fit the data. So far, most researchers come up with a manual design of a convolutional neural network that satisfies their expectations, but this topology will likely be far from the optimal solution. Finally, some works have evaluated the performance of ensembles or committees of convolutional neural networks; however, this area still remains quite unexplored. When using committees, several models with a good performance are put together in order to create a meta-classifier which performs better than any of the models from which it is composed. If, in the process of finding a good CNN architecture, other good architectures are found as well (even if they perform worse than the best found), they could be used to conform an ensemble. In this thesis, we are motivated by the hardness of finding the optimal topology of a convolutional neural network for solving a given problem. We begin this research line under the following working hypotheses: • Manual design of convolutional neural networks is a time-consuming task and leads to topologies which are far from the optimal. • Topology design of convolutional neural networks can be approached as an optimization problem. • Metaheuristics, and in particular evolutionary computation techniques, can be used for automatic optimization of CNN topologies in an affordable amount of time. • Evolved topologies will in average outperform manually-designed topologies or, in the worst case, lead to similar results requiring less effort. • Committees of convolutional neural networks will often perform better than a single neural network, and thus if several good solutions are found they can be used to build a committee. • A system can be designed to automatically optimize the topology of convolutional neural networks given a supervised learning problem in an effective and efficient manner. Given the above motivations and working hypotheses, the main aim of this thesis is the following: To explore the automatic design of optimal topologies of convolutional neural networks using evolutionary computation techniques, and explore the performance of the optimized topologies in order to validate that they are better than most, if not all, manually-engineered CNN architectures. To achieve such aim, we propose the achievement of the following objectives: 1. Proposal of one or more domain-agnostic encodings that model the topology of a CNN in a way that can be evolved using specific evolutionary computation techniques. 2. Proposal of a fitness function, which may be domain-dependent, to be able to compare the performance of different CNN topologies. 3. Design and development of a system able to optimize the topology of a convolutional neural network, given the fulfillment of objectives (1) and (2). 4. Improvement of the system designed in objective (3) in order to reduce the computational cost in terms of time, by enabling parallel fitness computations. 5. Selection of different domains to evaluate the performance of automatically evolved convolutional neural networks. These domains should cover a wide spectrum of applications: handwritten character recognition, signal classification, etc. 6. Exhaustive review of the state of the art of the domains chosen in objective (5), explicitly discriminating best results attained using convolutional neural networks and other techniques. 7. Comparative evaluation of the best CNN obtained for each domain against the state of the art ranking elaborated in objective (6). 8. Construction of committees / ensembles using more than one of the best found CNN topologies for each domain. 9. Evaluation of the performance of the committees built in objective (9), comparing it against the state of the art and the best individual CNN. 10. Analysis of the results and validation, if applicable, of the working hypotheses. To achieve the aforementioned objectives, we will adhere to a strict and systematic research methodology, which involves the following steps: 1. Extensive review of the state of the art, in order to find related work. These works will be carefully analyzed in order to find and outline flaws or improvements that can be solved within the current research work. 2. Design of a proposal which is able to satisfyingly achieve the objectives given the working hypotheses. The proposal will be described with enough level of details as to enable reproducibility by the scientific community. 3. Provision of the hardware and implementation of the software required to successfully materialize the proposal. 4. Completion of an exhaustive evaluation of the results against the state of the art, in order to validate the competitiveness of the proposal. 5. Validation of the working hypotheses and of the achievement of the research objectives. If some of the hypotheses do not hold or some objectives are not achieved, this fact should be carefully analyzed and the underlying hypotheses must be corrected. 6. Extraction of conclusive remarks about the contributions of the work and proposal of future lines of work to guide research within this field. Successful research must have a scientific impact. In this thesis, we estimate the contribution to be the following: • Learning how evolutionary computation techniques perform when optimizing the topology of convolutional neural networks, as compared to manual topology design. • Availability of an efficient system to perform automatic optimization of the topology of convolutional neural networks given the definition of a supervised learning problem; i.e., data and an objective function. • For each domain taking part in the evaluation, the result of the best CNN topology, or the best committees of convolutional neural networks will be contributed to the state of the art. Because of these contributions, we expect this thesis to have a positive impact in future research in the application of convolutional neural networks to different domains. Because the system will be freely available and will enable parallel fitness computation, design of new topologies can be accelerated, especially if the system is deployed in a multi-GPU cluster. If the optimization system resulting from this thesis serves for advancing research in key areas, such as healthcare or medicine, significant positive social impact could be achieved. To the best of our knowledge, this is the most complete work up to date in neuroevolution of convolutional neural networks. In this thesis, we have proposed the use of two evolutionary computation techniques, namely genetic algorithms and grammatical evolution, in order to evolve most aspects of the network topology. In order to accelerate the evolutionary process, we have developed a simplified fitness function which provides an approximation of the network performance, without requiring a full training process. Additionally, we have used a niching scheme to preserve diversity in the population, which has the additional advantage of enabling the building of committees (ensembles) of convolutional neural networks. Our proposal has been validated against several datasets. The first of them is MNIST, a well-known dataset for handwritten digit recognition, attaining highly competitive results. In fact, we have achieved a test error rate that is only outperformed by two related works. Also, ensembles have been proved to outperform individual models. Later, we have reused the best topology found in EMNIST, a newly introduced extension of MNIST which includes more samples and a new datasets of letters. These topologies have been proved successful in the new dataset, obtaining results better than any other work in the state of the art, and pointing out that topologies can be reused among domains of similar characteristics. To better understand the models mistakes, we have plotted the misclassified samples, so that the reader can check how they involve hardly-recognizable digits or characters. Finally, we have also tested the neuroevolutionary process against OPPORTUNITY, a dataset with labelled human activity information which uses high dimensional data gathered from a variety of sensors and whose objective is to recognize the gestures performed by the user. The F1 score attained using grammatical evolution to evolve an ensemble of CNNs have outperformed many of the best results reported so far in related works, placing our approach by the top of the ranking. Upon these discoveries, we consider the working hypotheses of this thesis validated, and the proposed objectives successfully achieved. In summary, evolutionary computation has proven to be successful when optimizing the topology of convolutional neural networks, leading to results better than those reported on the state of the art with hand-made architectures. In the remainder of this chapter, we summarize the validated hypothesis and objectives, and discuss future work to further extend this research line.