UNIVERSITY POLYTECHNIC OF MADRID FACULTY OF COMPUTER SCIENCE

UNIVERSITY POLYTECHNIC OF MADRID FACULTY OF COMPUTER SCIENCE DESIGN NEW SUPERVISED ART-TYPE ARTIFICIAL NEURAL NETWORKS, AND THEIR PERFORMANCES FOR CL

Author: Sandra Cortés Soto

0 downloads 145 Views 2MB Size

Report

DOWNLOAD PDF

Recommend Stories

Study (s) Degree Center Acad. Period FACULTY OF PHYSICAL SCIENCE AND SPORT FACULTY OF PHYSICAL SCIENCE AND SPORT

Master of Science Colorado State University

History of computer

First, second, third, fourth, fifth generation. Vaccum Tubes. Transistors. Integrated circuits. Microprocessors. Artificial Inteligence

THE SCIENCE OF CONCUSSION

THE SCIENCE OF CONCUSSION A special edition of UPMC Restore Accreditation Statement The University of Pittsburgh School of Medicine is accredited by

Reflections on the university management of science in Artemisa in a context of integration

University of Balamand. Yearbook

University of California, Berkeley

RESENAS HOMERO CASTILLO y RAjL SILVA CASTRO, Historia bibliogrdifica de la novela chi. lena, Colecci6n Studium, 28, Ediciones de Andrea, M6xico, 1961

UNIVERSITY OF CINCINNATI

UNIVERSITY OF CINCINNATI October 5, 2005 Date:___________________ Cristina Sanchez-Blanco I, ________________________________________________________

Department of Mathematics University of Puerto Rico

UNIVERSITY OF IOWA STUDIES

Story Transcript

UNIVERSITY POLYTECHNIC OF MADRID FACULTY OF COMPUTER SCIENCE

DESIGN NEW SUPERVISED ART-TYPE ARTIFICIAL NEURAL NETWORKS, AND THEIR PERFORMANCES FOR CLASSIFICATION LANDSAT TM IMAGES

Presented by KAMAL R. AL-RAWI To obtain the Ph.D. in Computer Science

MADRID- SPAIN 2001

CONSUELO GONZALO MARTIN, Associate Professor, Department of Architecture and Technology of Computer Systems,

Faculty

of

Computer

Science,

University

Polytechnic of Madrid.

CERTIFIES: that the thesis entitled "DESIGN NEW SUPERVISED ART-TYPE ARTIFICIAL NEURAL NETWORKS, AND THEIR PERFORMANCES FOR CLASSIFICATION LANDSAT TM IMAGES", has been carried out by KAMAL R. AL-RAWI, under my supervisión, in the Department of Architecture

and

Technology of Computer Systems, Faculty of Computer Science, University Polytechnic of Madrid.

To

Prof. Dr. Amos Eddy

ACKNOWLEDGEMENTS I gratefully thank Dr. Consuelo Gonzalo Martín, associate professor of computer science, at Faculty of Computer Science, University Polytechnic of Madrid, for her continuous efforts during the supervisión of this thesis. Her guide and criticisms were a great help to me. The criticisms of Dr. Águeda Arquero Hidalgo and Dr. Estibaliz Martínez Izquierdo were a great help. They were always in touch during preparing of this work. My grateful thanks to Professor Dr. Pedro Gómez Vilda and the rest of the Working Group on Computer Technology, Dr. Victoria Rodellar Biarge, Dr. Mercedes Pérez Castellanos, and Dr Víctor Nieto Lluis, for their support and useful discussion. I would like to thank all the gradúate students in the group, especially to Vicente Garcia del Cantara, for the friendly atmosphere during my stay in the Department of Architecture and Technology of Computer Systems. My thanks to the secretary of the department Mrs. M. del Carmen Parró Cruz, who was always there to arrange our administration works. I gratefully thank professor Dr. José Luis Casanova, Director of the Remote Sensing Laboratory (LATUV), University of Valladolid for using the facilities of the laboratory. My thanks to Miss Sarah Strauss and Miss Nicole Knudsen for the proof reading of the Thesis. Finally, I need to thank my wife Eman, my daughter Hiba, and my sons Saif Al-Deen and Haitham for their support during the preparing of this work.

INDEX

CHAPTER I: INTRODUCTION

1

1.1 Historical background

1

1.2 Adaptive Resonance Theory ANNs

2

1.2.1 Unsupervised ART ANNs

3

1.2.2 Supervised ART ANNs

4

1.3 classifying remotely sensed data with ANNs

4

1.4 Objectives

7

CHAPTER II: FUZZY ART ANN

8

2.1 Introduction

8

2.2 Matching system and vigilance parameter

8

2.3 Fuzzy ART dynamics

9

2.4 Fast-learning slow record option

14

2.5 Complement coding

15

2.6 Fuzzy subset and conservative limit

15

2.7 Training Algorithms of Fuzzy ART

16

2.8 Evolution of Fuzzy ART

19

2.9 Newly developed versions of Fuzzy ART

23

2.9.1 Flagged approach

24

2.9.2 Training algorithms of Flagged-Fuzzy ART.

27

2.9.3 Compact approach

29

2.9.4 Training algorithms of Compact-Fuzzy ART.

33

2.10 Categorization

35

C H A P T E R III: F U Z Z Y A R T M A P

,

37

3.1 Introduction

37

3.2 Fuzzy ARTMAP

37

3.2.1 Vigilance parameter dynamics in supervised environment.

39

3.2.2 trainingphase

43

3.2.3 Classification phase

47

3.3 Full algorithm of Fuzzy ARTMAP

47

3.3.1 Training algorithms of Fuzzy ARTMAP.

48

3.3.2 Classification algorithm of Fuzzy ARTMAP

50

C H A P T E R IV: S U P E R V I S E D A R T - I A N N

52

4.1 Introduction

52

4.2 Supervised ART-I

54

4.2.1 Architecture ofSupervised ART-I.

55

4.2.2 Data Description

58

A.2.3 Training of Supervised ART-I

58

4.2.4 Classification by Supervised ART-I

60

4.3 Algorithm of Supervised ART-1

60

4.3.1 Training Algorithm of Supervised ART-1.

60

4.3.2 Classification algorithm of Supervised ART-I

63

4.4 Discussion

64

CHAPTER V: SUPERVISED ART-II ANN

66

5.1 Introduction

66

5.2 Supervised ART-II

66

5.2.1 Architecture ofSupervised ART-II.

66

5.2.2 Training of Supervised ART-II

68

5.2.3 Classification by Supervised ART-II. 5.3 Full algorithm of Supervised ART-II

74 74

5.3.1 Training algorithm of Supervised ART-II

74

5.3.2 Classification algorithm ofSupervised ART-II.

78

5.4 Discussion

79

CHAPTER VI: PERFORMANCE OF SUPERVISED ART-I&II FOR CLASSIFICATION OF LANDSAT TM IMAGES

82

6.1 Satellites Landsat

82

6.2 Data

84

6.3 Performance

84

6.3.1 Training performance

84

6.3.2 Classification performance

92

CHAPTER VII: PERFORMANCES OF SUPERVISED ART ANNsWITH DIFFERENT VIGILANCE DYNAMICS

99

7.1 Introduction

99

7.2 Vigilance dynamics

99

7.2.1 Flying approach

99

7.2.2 Fixed vigilance approach

100

7.2.3 Free vigilance approach

100

7.2.4 Floating approach

102

7.3 Results and discussion

102

CHAPTER VIII: CONCLUSIONS

106

BIBLIOGRAPHY

109

APPENDIX: RESUMEN

115

A.l. INTRODUCCIÓN

115

A.1.1 Evolución histórica de las Redes Neuronales Artificiales (RNA)

115

A.1.2 Clasificación de datos remotamente detectados con RNA

116

A.2. OBJETIVOS DE LA TESIS

119

A.3. REDES NEURONALES ARTIFICIALES TIPO ART

119

A.3.1 Fuzzy ART

121

A.3.2 Fuzzy ARTMAP

123

A.4. PROPUESTA DE DOS VERSIONES MEJORADAS DE FUZZY ART 125 A.4.1 Versión "Flagged" de Fuzzy ART

125

A.4.2 Versión "Compact" de Fuzzy ART.

126

A.5. PROPUESTA DE DOS NUEVAS ARQUITECTURAS SUPERVISADAS TIPOART 127 A. 5.1 Algoritmos de aprendizaje y clasificación de la arquitectura Supervised ART-I

128

A.5.2 Algoritmos de aprendizaje y clasificación de la arquitectura Supervised ART-IJ

130

A.6. EVALUACIÓN DE LAS PRESTACIONES DE SUPERVISED ART-I Y SUPERVISED ART-II EN LA CLASIFICACIÓN DE IMÁGENES REMOTAMENTE DETECTADAS 132 A.7. PRESTACIONES DE REDES SUPERVISADA TIPO ART PARA DIFERENTES DINÁMICAS DEL PARÁMETRO DE VIGILANCIA.

135

A.8. CONCLUSIONES

137

:

LIST OF FIGURES Figure 2-1: Fuzzy ART dynamics

13

Figure 2-2: The architecture of Fuzzy ART

17

Figure 2-3: The architecture of FlaggedFuzzy ART.

26

Figure 2-4: The architecture of Compact Fuzzy ART.

31

Figure 3-1: Block diagram shows supervisión through mapfield.

38

Figure 3-2: The full architecture for supervisión through mapfield.

40

Figure 3-3: The architecture of ARTMAP for classification problem

41

Figure 3-4: Full architecture of Fuzzy ARTMAP

42

Figure 3-5: Match tracking using flying vigilance parameter

44

Figure 4-1: Training of map filed weights

53

Figure 4-2: Supervisión dynamic of the tagging approach of Supervised ART-I...

56

Figure 4-3: Architecture of Supervised ART-I

57

Figure 5-1: Supervisión dynamic of the stacking approach of Supervised ART-II

67

Figure 5-2: Architecture of Supervised ART-II

69

Figure5-3: Determination the winning node in the stacking-supervision approach of Supervised ART-II 71 Figure 6-1 :Number of category nodes in the domain of the vigilance parameter p and the dynamic learning parameter J3, using 9000 pixels of the Landsat TM images 87 Figure 6-2: Training time, in minutes, for Supervised ART-I, in the domain of the vigilance parameter p and the dynamic learning parameter /?, using 9000 pixels of the Landsat TM images 89 Figure 6-3: Training time, in minutes, for Supervised ART-II, in the domain of the vigilance parameter p and the dynamic learning parameter /?, using 9000 pixels of the Landsat TM images 90

Figure 6-4: The ratio of training time for Supervised ART-I / Supervised ART-II, in the domain of the vigilance parameter p and the dynamic learning parameter /?, using 9000 pixels of the Landsat TM images 91 Figure 6-5: Classification time, in minutes in the domain of the vigilance parameter p and the dynamic learning parameter /?, for 52 440 pixels of the Landsat TM images 93 Figure 6-6: Classification performance, in the domain of the vigilance parameter p and the dynamic learning parameter j5, for Landsat TM images 94 Figure 6-7: The abo ve image is the reference image. The lower image is the classified image using Supervised ART-II, with vigilance parameter p =0.98, the dynamic learning parameter /? =0.50, and training with 9000 exemplars. The classification accuracy is 85.82%

95

Figure 7-1: Sketches show different vigilance parameter dynamics: Fixed, free, And float approaches

101

Figure 7-2: Classified images for landsat TM images. First, second, third, and forthcolumn represents classified images using fly, float, fixed, and free vigilance parameter, respectively. First, second, third, fourth, and fifth raw represents classified images using initial vigilance parameter and dynamic learning rate of (0.98, 0.50), (0.95, 0.20), (0.90, 0.15), (0.70, 0.15), (0.00, 0.15), respectively

104

LIST OF TABLES Table 2-1: Comparison among Original, Flagged, and Compact algorithms of Fuzzy ART. The last two have been developed in this study

32

Table 5-1: Comparisons between Fuzzy ARTMAP, Supervised ART-I, and Supervised ART-II

80

Table 6-1: Descriptions for Landsat-5 Thematic Mapper (TM) images Table 6-2: Performance of Supervised ART-II when trained with different sizes of training samples

86

Table 6-3: Training and classification statistics for Landsat TM image at individual classlevel

96

Table 6-4: The confusión matrix for the classification process for the 52 440 pixels of the Landsat TM image....

97

Table 7-1: The performance of Supervised ART-II ANN with different vigilance dynamics

83

103

ABSTRACT New Supervised ART ANNs with simple architectures have been developed in this study. Their architectures have been built from a single module of ART rather than a pair of them connected by a map field as all other supervised ART-type ANNs that have been reported in the literatee. Two different algorithms have been developed: Supervised ART-I and Supervised ART-II. The developed algorithms reduced the number of dynamic parameters, memory requirement, and the training time which is the major problem facing the ANNs, without altering the classification accuracy. Two simplified versión of Fuzzy ART algorithms have been developed, keeping the categorization performance as that of the original algorithm. They are Flagged Fuzzy ART and Compact Fuzzy ART. While Supervised ART-I and Supervised ART-II are general in nature that can be applied to all ART ANNs, the supervisión of Compact Fuzzy ART has been addressed in this work. The full algorithms for Supervised ART-I and Supervised ART-II have been listed. The newly developed ANNs have been applied to classify Landsat Thematic Mapper (TM) images. The performance of the systems has been tested for different dynamic parameters and different training samples. The behavior of the systems in the vigilance parameter and dynamic learning parameter space has been addressed. Their performances in the domain of the vigilance parameter and the dynamic learning parameter have been understood. Only one approach, for vigilance dynamic in all supervised ART-type ANNs, has been addressed in the literatee. Three more approaches have been developed in this study: fixed, free, and float. The performance of the developed ANNs for classification landsat TM images has been tested for all these different vigilance dynamics.

CHAPTERI INTRODUCTION

1.1 Historical background Although the roots of the fieldof Artificial Neural Networks (ANNs) extend to 1943 when McCulloch and Pitts built the first artificial neural structure, its foundations have been established in mid seventies. (Werbos 1974) developed the principie of the Back Propagation (BP) ANN. (Grossberg 1976) developed the principie of Adaptive Resonance Theory (ART) ANNs. However, the great theoretical advance of the field has been achieved in 1980s. In that decade the algorithms of the BP ANN were developed independently by many authors (Le Cun 1986, Parker 1986, and Rumelhart et al. 1986). The Kohonen Self-Organizing Map KSOM (Kohonen 1982) and Hopfield ANN (Hopfield 1982) have been developed. A lot of advances have been achieved for ART ANNs (Carpenter & Grossbergh 1987a&b). ART ANNs is the concern of this study due to their stability, rapidity and accuracy (Carpenter et al. 1991a&b, 1992, 1997, and Gan & Lúa 1992). ART ANNs have been applied in many fields. Boeing Company has implemented ART-1 neural information retrieval system for its engineering designs (Caudell et al. 1994). Boeing Company has thousands of designs for its aircraft parts. Features are extracted for each design. These features are presented to the network to establish categories for these designs. When a new design is needed, its features are presented to the system to determine the category that the required design belongs to. 1

Retrieval some features from the designs of the pointed category will avoid the repetition of work for the new design. ART ANNs have been employed for target recognition (Seibert & Waxman 1992). Their approach is extracting features of the target (aircraft) from different views. (Bernardon & Carrick 1995) have used them also for target recognition using SyntheticAperture Radar (SAR) imagery. After learning the network, target recognition is done through matching the signal of the target with a set of stored target models. (Kumar & Guez 1989, and Waxman et al. 1995) have used ART ANNs for target recognition too. Kumar and Guez worked with visible, while Waxman and his group worked in visible, infrared and SAR. Moreover, ART ANNs have been employed for robot sensory motor control (Baloch & Waxman 1991, Bachelder et al. 1993, Dubrawski & Crowley 1994, Srinivasa & Sharma 1996) and robot navigation (Racz & Dubrawski 1995); machine visión (Caudell & Healy 1994); object recognition (Seibert & Waxman 1992); face recognition (Siebert & Waxman 1993); pattern clustering (Moore 1989, Mekkaoui & Jespers 1990); character recognition (Wilson et al. 1990); sonar signal processing (Simpson 1990); medical imaging (Soliz & Donohoe 1996); electrocardiogram wave recognition (Ham & Han 1996); signature verification (Murshed et al. 1995); fault identification problem in a nuclear power plant (Keyvan 1999); and remote sensing (Gopaleía/. 1994; Baraldi & Parmiggiani 1995).

1.2 Adaptive Resonance Theory ANNs There are two types of ANNs, supervised and unsupervised. In unsupervised case only the input features are introduced to the input layer, then the network categorizing them. While in the supervised type of ANNs the class code is supplied to

2

the network together with the input features. During training phase, when the network correctly classifies an input feature, weights are trained, otherwise correction should be done.

1.2.1 Unsupervised ART ANNs The principie of ART ANNs was introduced in the literature as a theory of human cognitive information processing (Grossberg 1976, 1980). Since then a series of ART-based ANNs have been developed for unsupervised category learning and pattern recognition in real-time: ART1 (Carpenter & Grossberg 1987a), ART2 (Carpenter & Grossberg 1987b), ART3 (Carpenter & Grossberg 1990), SART (Baraldi & Parmiggiani 1995), and Fuzzy ART (Carpenter et al. 1991a). ART1 has the ability to categorize arbitrary binary input patterns (Carpentar & Grossberg 1987a). ART2 has the ability to deal with binary and analog pattern as well (Carpentar & Grossberg 1987b). The information, in ART1 and ART2, flows forward through weights that are connected each node in the input layer to all nodes in the category layer, and backward through another set of weights which connect each category node to all nodes in the input layer. A simple architecture of unsupervised ART ANN has been developed (Carpentar et al. 1991a). They called it Fuzzy ART. It is like ART2, in that it has the ability to categorize analog multi-valued input patterns and binary input patterns as well. Weights in Fuzzy ART connect each node in input layer to all category nodes. Information flows through these weights in one way, from the input layer to the category layer. Fuzzy ART will be explained in details in chapter II.

3

1.2.2 Supervised ART ANNs In the early nineties, two supervised ART architectures have been developed ARTMAP (Carpenter et al. 1991b) and Fuzzy ARTMAP (Carpenter et al. 1992). Architecture of ARTMAP has been built from two modules of ARTl, while architecture of Fuzzy ARTMAP has been built from two modules of Fuzzy ART. ARTMAP has the ability of learning and classifying binary multivalued input patterns. Fuzzy ARTMAP has the ability of learning and classifying analog input patterns, in addition to the binary one (Carpenter et al. 1992). More supervised ART-type ANNs have been developed; ART-EMAP (Carpenter & Ross 1993), Gaussian ARTMAP (Williamson 1996), ARTMAP-IC (Carpenter & Markuzon 1998), and Distributed ARTMAP (Carpenter .1998). In all these architectures, the supervisión has been done through map field that requires two modules of ART. Fuzzy ARTMAP has been used widely. It showed better performance than various other ANNs dealing with different problems such as, automatic analysis of electrocardiogram (Ham & Han 1996); diagnostic monitoring of nuclear plants (Keyvan et al. 1993); and prediction of protein secondary structure (Mehta et al. 1993).

1.3 Classifying remotely sensed data with ANNs Mapping land-cover using remotely sensed data is a very active área of research, due to the advances in space and computer technology (Benediktsson et al. 1990). Conventional classification is usually employed for this task. However, neural networks have been often used in the last decade. The main advantages of neural networks over conventional classifiers as Máximum Likelihood Classifier (MLC) are that: 1) they are non-parametric, therefore, the probability distributions for each class are not required. This allows us to introduce ancillary data (slope, topography, aspect,

4

...etc), in addition to the spectral data to the network, which many authors reported can increase the classifícation accuracy (Benediktsson et al. 1990, Carpenter et al. 1997). Moreover, neural networks are more robust when the distribution is not gaussian (Paola & Schowengerdt 1997, Hepner et al. 1990). 2) Unlike conventional classifíers, neural networks are able to manage fuzzy classifications (Paola & Schowengerdt 1997, Warner & Shank 1997, Yool 1998). The numbers in the output represent the strength of the classes membership of the specific input. This is very important when we deal with low spatial resolution. 3) The parallel feature of neural networks allows us to increase the speed of the classifícation process. This can be done by implement them on parallel computers (Salu & Tilton 1993, Heermann & Khazenie 1992). 4) The neural networks have flexibility for classifícation improvement (Carpenter et al. 1997). 5) It has the ability for establishing an arbitrary decisión boundary (Paola & Schowengerdt 1995, Tzenge/a/. 1994). "Neural networks offer a flexible approach to building the complex, highly non-linear models that required for a complex system. ... Unlike traditional expert systems where knowledge is made explicit in the form of rules, neural networks genérate their own rules by learning from exemplars" (Keyvan 1993). Multi-Layer Perceptron (MLP), with Back Propagation learning, is the most commonly used neural network in the literature to classify remotely sensed data. This is due to the preferable learning approach of the network, which is based on minimizing the error between the output of the network and the target valué. While some authors have reported that conventional classifíers perform better than MLP (Mulder & Spreeuwers 1991, Solaiman & Mouchot 1994), many authors have reported that MLP perform better than MLC in classifying remotely sensed data (Hepner et al. 1990, Heerman & Khazenie 1992, Paola & Schowengerdt 1994, Yoshida & Omatu 1994). The

5

classification performance of MLP can be improved by using ancillary data in addition to the spectral data (Benediktsson et al. 1990). However, employing MLP as a classifier incurs many problems. The architecture of the network is not fixed. The number of hidden layers and the number of nodes in each hidden layer must be determined by trial and error. This is a very costly process keeping in mind the long training time of the network. In addition to that MLP might fall in a local mínimum during the training phase. Moreover, MLP might not converge. Using a small learning rate to avoid the convergence problem makes the long training time of the MLP network much longer. (Heermann & Khazenie 1992) suggested using parallel computers to reduce the training time. This reduces the training time but increases the hardware cost. For classification of a Landsat image, (Carpenter et al. 1997) reported that MLP did not converge, using learning rate=0.6 and momentum rate=0.4, after 212 minutes of training time on a SUN 4 SPARC Station, using 100 000 input presentations. They employed a lower learning rate to avoid the convergence problem. The training time exceeded 1000 minutes, while the classification accuracy was less than 27%. They reported that Fuzzy ARTMAP (Carpentar et al. 1992) perform better classification accuracy than MLP, with lower training time. They reported also that Fuzzy ARTMAP and MLC perform the same level of classifying accuracy. Fuzzy ARTMAP has been employed also by (Mannan et al. 1998) to classify (512x512) pixels, of an image of the Linear Imaging Self-scanning Sensor (LISS-II) of Indian Remote Sensing Satellite (IRS-1B), for their 13 classes. They reported that Fuzzy ARTMAP performs better than both MLC and MLP in classification accuracy. The average classification for six data sets are 84.7%, 80.3%, and 79.9% for Fuzzy ARTMAP, MLC, and MLP, respectively. They reported that the training time was slightly less than that for MLC, but many times faster than MLP.

6

Unlike MLP, Fuzzy ARTMAP has a well define architecture, it always converges, and can tune itself to represent sub-classes by generating a new category node. However, the main drawback to Fuzzy ARTMAP lies in the complex architecture. ít is constructed from two modules of Fuzzy ART linked by a map field.

1.4 Objectives The global objective of this work is to design new simplified versions for ART ANNs architectures, which maintain their original performances, but improve computational time and memory. This objective can be divided in several partial objectives: •

Design new simple architectures of ART-type ANN, which provide the same classification performances of classical ARTs.

•

Develop learning and classification algorithms for these architectures.

•

Encode the developed algorithms.

•

Study of the behavior of the developed architectures for classification of remotely sensed images Landsat Thematic Mapper (TM) in the whole domain of the dynamic parameters. The lay out of this study will be as follow: Chapter II deals with Fuzzy ART.

Chapter III deals with Fuzzy ARTMAP. Chapter IV deals with the newly developed architectures "Supervised ART-I". Chapter V deals with the newly developed architecture Supervised ART-II. The performance of Supervised ART-I and Supervised ART-II ANNs for learning and classifying Landsat TM images are addressed in Chapter VI. Performances of the newly developed ANNs using different vigilance dynamics are addressed in chapter VIL Conclusions are listed in chapter VIII.

7

CHAPTERII FUZZY ART ANN

2.1 Introduction The Fuzzy ART is an unsupervised ART-based ANN. Its architecture has been designed for leaming and categorization of arbitrary analog or binary multi-valued input patterns. This has been achieved by using the mínimum operator ( A ) of the fuzzy set theory instead of the intersection operator ( n ) of the set theory, which has been employed in ART1.

2.2 Matchmg system and vigilance parameter "Fuzzy ART incorporates the basic features of all ARTs system, notably, pattern matching between bottom-up input and top-down leamed prototype vectors. This matching process leads either to a resonant state that focuses attention and triggers stable prototype leaming or a self-regulating parallel memory search. If the search ends by selecting an established category, then the category's prototype may be refined to incorpórate new information in the input pattern. If the search ends by selecting a previously untrained node, then leaming of a new category takes place" (Carpenter et al. 1991a). If the matching valué is greater than the predetermined valué, resonance occurs and new information is incorporated to the winning category node through training its weights, otherwise, a self-organizing parallel memory search is conducted.

8

The match criterion is called vigilance parameter/?. It calibrates the mínimum confídence that a category node must have to represent the current input, before search for a better-committed category node is triggered. If all committed category nodes fail to represent the current input, a new category node is committed, as long as the network's memory capacity is not fully utilized. The vigilance parameter is a nondimensional number pe(0,

1]. A valué of 1 means perfect matching. Low vigilance

parameter leads to code compression with broad generalization for categories. High vigilance parameter leads to large number of category nodes with fine categories. Vigilance parameter is the key feature of all ART ANNs. An ART ANN can discrimínate up to the individual level by setting p - 1, while creating a single category •node for all data by setting p = 0. The valué of the vigilance parameter is determined according to type and amount of data that we have, categorization level that we look for, the required speed, and available memory. The vigilance parameter is fixed during training in all unsupervised ART ANNs.

2.3 Fuzzy ART dynamics Input patterns A^e[0,

1] are presented at the input layer F¡. The choice

function Tj° for each committed category node of the category layer F? is computed according to equation (2-1). The choice valué represents the activation level of each committed category node;

2M

T{,)

E(4(,)AW,) = -^

• /=1

C

2-1

« + !>!/ 1=1

9

where wy are the weights which connect each category node/ in the Frfield with all nodes of the input layer F¡. All weight valúes are initially set to 1 (i.e. Wy = l;for / = 1, ..., 2M. and j = 1, ..., Q . M represents the dimensión of the input features. Since, the normalized features and their complements are introduced to the network, the dimensión of the input vector A¡° is 2M. a is the choice parameter (a > 0). C is the total number of committed category nodes at iteration t. The winning committed category node is determined;

T},) = max{T¡')};j=l,...,N

2-2

It represents the category node with the highest choice valué among all category nodes (committed and uncommitted) in the category layer. N represents the full memory capacity of Fuzzy ART. The valué of N is normally much larger than C (N»C).

All

category nodes N are involved, instead of committed category nodes C, which has been employed by (Carpenter et al. 1991a). Their reasoning for this is to let uncommitted category nodes be committed, when it is needed, in a sequence order (1, 2,...,

j-\,j,j+1,

..., N). To achieve this, they assigned a very small positive valué ^.to each category nodes before training is started. They called it, F2 -order constants. These valúes are decreasing as the index of the order of category node y" in the memory field is increased.

Tj=4j;MC

+ lf>,...,N

0 < ^ oo, all committed category nodes will be tested before a new node is committed. The valué of a alters the order of search among committed category nodes. "A node is called an uncommitted node if all its top-down weights are equal to the initial top-down weight valué, otherwise, the node is committed" (Georgiopoulos et al. 1996). This test takes time especially when the input features are large.

21

The weights updating in their fuzzy ART algorithm (they called it Fuzzy ART Variant, because a -» oo and

K * ^ " ^ ^

2-15

While the very large valúes for the choice parameter a and the uncommitted node choice parameter p leads to testing of all committed category nodes before a new node is committed in Fuzzy ART, they are not practical for Fuzzy ARTMAP because they créate too many category nodes (Georgiopoulos et al. 1996). It has been explained in (section 2.3) the use oíF2-order constant (f} to assure that the test for all committed category nodes is completed before a new node is committed. Since fj » 0, all the committed category nodes will be tested to represent a specific input before the uncommitted category nodes are tested. The valué of Pj decreases as j increases. This lets the uncommitted node C+l to be chosen before other uncommitted category nodes. This approach has not been mentioned in the original algorithm of Fuzzy ART (Carpenter et al. 1991a) or the original algorithm of Fuzzy ARTMAP (Carpenter et al. 1992), but has been extracted from the full Fuzzy ARTMAP algorithm that has been listed by (Carpenter et al. 1997). (Geongiopoulos et al. 1999) include all committed category nodes in addition to one uncommitted category node in the search for máximum choice valué node. The choice valué for their algorithm is; 22

limJs^M

T

= -E(4(,)AW,)

;j = l,...,C

J

O

2-16

; = c +1

They assigned very large valúes for the initial weights in order to have a zero valué for uncommitted category node, and therefore committed category nodes will be tested first, before a new node is committed. This forced them to use fast learning when a new node is committed to reduce them to their theoretical valúes, which are below one.

2.9 Newly developed versions of Fuzzy ART The determination of the winning category node among the full capacity of the network N, as reported by (Carpenter & Grossberg 1987a, Carpenter et al. 1991a), is time consuming. The capacity of the system can be very large especially when it is working in a non-homogenous environment. Uncommitted category nodes can be committed in sequence order without using the prearranged choice valúes ¡j and without including all the capacity of the category layer N in determination the máximum choice valué node J. Two new versions of Fuzzy ART architectures have been constructed in this work. The first one is the Flagged approach. This approach involves the uncommitted category node with rank C+í in the category layer together with all committed category nodes to determine the máximum choice valué node J. A total of C comparison is required rather than JV-1 as the case in the original Fuzzy ART architecture. As mentioned before, this approach has been conducted by (Georgiopoulos et al. 1999), but in their approach they assigned large valué for initial weights which forced them to use

23

fast learning. The second one is the Compact approach, which involves only committed category node C to determine the máximum choice valué node J.

2.9.1 Flagged approach There is no reason at all to involve Tc+1,..., TN in determination the máximum choice valué node. Only the uncommitted category node with rank C+1 in the category layer will be involved. This uncommitted category node is flagged by assigning a valué of /c+1 to its choice valué such that;

T**-* ¿c+1 0

2-17

A negative valué is assigned for /c+l, because the input features A¡ as well as the weights w¡j never have negative valúes. So, according to equation 2-11 the choice valué for any committed category node is never a negative valué,

Tj 0

;j=l,...,C

2-18

However, the valué of (fc+x must be greater than the choice valué of committed category nodes that are in shut off mode. In this way, when all committed category node are in shut-off mode, the flagged node with index C+1 in the category layer, will be chosen as the máximum choice valué node. We should not worry about the match valué of a new committed category node, since the match valué of any new committed node is equal to one, which is the highest valué that the vigilance parameter p can have. That is because A¡ is normalized to [0, 1] before its presentation to the network, and the initial

24

weights for category nodes are equal to one. So input 4 i s

a

subset of w / c + ] . That

means A, A wiC+l = A¡. According to equation 2-11, computing the match function for the subset choice leads always to one as demonstrated below;

2M

1 1

-t(A¡

1

2M

M

A W/c+1 ) = — Y 4 = — = 1

2-19

Therefore, the uncommitted flagged node C+\ will not go to shut off mode. It will pass the match test for sure. After resonance occurs, a check should be done to see if the flagged uncommitted category node is chosen. If JC then the flagged node has been chosen. The number of committed category node must be increased by one (C=C+1) and the weights of the new flagged node wiC+i should be initiated;

w

i,c+\ = ! ; i'=l, -.., 2M

2-20

The full architecture of the Flagged-FuzTy ART is shown in (figure 2-3). Only the committed category nodes and the flagged uncommitted category node are involved in determination the máximum choice valué node. Weights are not established yet to connect uncommitted category nodes in the F2 - layer with the all nodes of the input layer Fx.

25

to os

Figure 2-3: The architecture of Flagged Fuzzy ART. Only committed category nodes and the uncommitted category node with index C+1 in the category layer are involved in determination the máximum choice valué node J. These category nodes are shown in dark. Weights are connected to all these category nodes. Category nodes that are not involved in determination the máximum choice valué node, are shown in light. Weights are not connected to them. Weights that connected to the flagged node (uncommitted category node with index C+1) are shown in light. This is because they are not initiated yet. It will be initiated (w¡c+1 = 1; i=l, ..., 2M) only when it will be chosen as the máximum choice valué node. The number of comparison which is needed to determine the máximum choice valué node is C, since it is carried out among committed category nodes plus theflaggednode. This reduces training time.

2.9.2 Training algorithms of Flagged-Fuzzy ART 1) Input parameters; a) Dynamic parameters: i- pe

(O, 1]: The vigilanceparameter.

ii- fie (O, 1]: The dynamic leaming parameter; P = 1 for fast leaming. iii-

a

0: The choice valué parameter.

b) Data characteristics; i- M: The dimensión of the input features. ii- Pt: The number of exemplars to be used in leaming.

c) Initialization;

i-rc+1=-o.i ii- Number of iteration t=l. iii- Number of committed category nodes C=\.

2) New input; r(') -

' a^

for

\iM

\-af

for M +

li2M*

3) Compute the choice function for all committed category nodes; 2M .

ZUWAW,) 7^(0 _ ±\ J

2M

,j=l,...,C

4) Reset: Determine the node J, which has the máximum choice valué; r/(,)=mox{7';,)}ÍJpi,...,c+l

5) Matching criterion: If (^(A^

AWÍ,)

Mp) then;

1=1

i- Shut off this node to put it out of competition;

ii- GOTO STEP (4)

6) If (JC) Then new category node has been committed i- C=J ii- wu =• 1 ; i=l,..., 2M iii-rc+1=-o.i

7) Training;

8)If(tPt)Then; i- t=t+l ii- GOTO STEP (2)

9) Training has been done. The network is ready for categorization.

28

2.9.3 Compact approach Uncommitted category nodes can be committed in sequence order without using the flagged uncommitted category node. It involves only the committed category node to determine the máximum choice valué node J. It is called Compact approach. The choice function is computed for committed category nodes. The máximum choice valué node J i s determined among committed category nodes C only.

rj'=max{7f};y=l,...,C

2-21

The match valué of the selected category node J is tested against the predetermined valué of the vigilance parameter p. If the match valué of node J is less than p, the node is shut off by assigning a valué of -1 to its choice valué to put it out of competition during the current input. Otherwise, the node is trained, all committed category nodes are on, and new input is presented to the network. When the máximum choice valué equals -1 all committed category nodes are in shut off mode.

The uncommitted category node C+l should be committed to

represent the current input in order to prevent the fragmentation of the category layer. Simply training the initial weights of the category node with index C+l, and increasing the count of the committed category nodes by one can do this. This commits uncommitted category nodes according to their order in the category layer. The number of comparison needed to determine the máximum choice valué node is (C-l) rather (N1) which the original Fuzzy ART algorithm requires. This will save a lot of computation time, keeping in mind that

N»C.

29

In the case of new category node should be committed, its weights will be updated through the next equation;

™S,=/H(°+(1-/?)

;M,...,2M

2-22

According to this equation weights initialization (Wy ; i=\,..., 2M;j=\, ..., JV) is not required, as reported by (Carpenter et al. 1991b). This also will save time since this equation requires less arithmetic operations than the previously suggested one. The full architecture of Compact-Fuzzy ART is shown in (figure 2-4). Committed category nodes are shown in dark. Uncommitted category nodes are shown in light. Weights connect all input layer nodes to committed category nodes only. Weights are not connected to uncommitted category nodes since they are not committed yet (they are not assigned weights yet). The comparison among the original Fuzzy ART, Flagged-Fuzzy ART, and Compact-Fuzzy ART is shown in (table 2-1). It shows clearly that Flagged-Fuzzy ART and Compact-Fuzzy ART are faster than the original algorithm of Fuzzy ART. The main point that is influencing the reduction of the training time is the number of comparisons that are needed to determine the winning category node. They are, as mentioned before, N-í, C, and C-1 for the original Fuzzy ART, Flagged-Fuzzy ART and Compact-Fuzzy ART, respectively.

30

UJ

Figure 2-4: The architecture of Compact Fuzzy ART. Only committed category nodes are involved in determination the máximum choice valué node /. These category nodes are shown in dark. Weights connect all input layer nodes to committed category nodes only. Uncommitted category nodes are shown in light. Weights are not connected to them since they are not committed yet (they are not assigned weights yet). The number of comparison which is needed to determine the máximum choice valué node is C-l, since it is carried out among committed category nodes only. This reduces training time.

w

í=l

^ n

'- P

¿c + 1 = - 0 . 1

VIT^AV^-ZH' V^/^A^VO-ZK'

None

Same

None

4^/?4«i-m=i2

Same

w¡j = 1; i = 1,...,2M None

Same

yin=fiAxtf)*Q-MSame

ySj=Xi=X..2Mj=\.^

1=1

Table 2-1: Comparison among Original, Flagged, and Compact algorithms of Fuzzy ART. The last two have been developed in this study. Flagged and Compact algorithms are faster, however, Compact algorithm is recommended.

Replacement fíxed choice valué ^

Weights updating for new node

Weights updating for oíd node

Weights initialization

Match testing 2M

c-\

c

N-\

Number of comparison for Tmax 1M

Tj=-l

tnox

J C

Same

J C

Same

Check for new committed node

(-1

:/' = l,...,C

None

Compact

TJ=maP/J=l...C+l} TJ=matTJ;j=l...Q

T = '"'

S(4A WJ/ )

2M

04..$..$$=£ ¿ c + 1 = - 0 . 1

Flagged

ry=W£ü(7;.;y=l,...M

Determination T„m

Choice function 7}

Initialization for Choice valué

Original

2.9.4 Training algorithms of Compact-Fuzzy ART 1) Input parameters; a) Dynamic parameters; i- p e (O, 1]: The vigilance parameter. ii- P e (O, 1]: The dynamic leaming parameter; P = 1 for fast leaming. iii- & 0: The choice valué parameter.

b) Data characteristics; i- M: The dimensión of the input features. ii- Pt: The number of exemplars to be used in leaming.

c) Initialization; i- Number of iterations t=l. ii- Number of committed category nodes C=\.

2) New input;

A"

=

a\l)

for

l-af

\iM

for M +

li2M\

3) Compute the choice function for all committed category nodes; 2M .

m

.

ZU AW,)

y(0 _ »= ' 3

2M

;7=1,..., C

«+E% ¿=i

33

4) Reset: Determine the node J, which has the máximum choice valué;

T}l)=max{T^}

;/=i

c

5) If TJl) - - 1 (all committed category nodes are in shut-off mode) then a node (the node that its order in the category layer is C+1) should be committed; i- Increase the number of committed nodes by one; C=C+1 ii- If in fast-learning slow-record mode; Assign the valúes of the input feature to the weights of this node;

™£St=4l)

;M,...,2M

Else (normal mode)

wfr=j8A?+(l-f])

;/=!,..., 2M

iii- GOTO STEP (2)

2M

6) Matching criterion: lí(^(A¡l)

A WU) < Mp) then;

í=i

i- Shut-off this node to put it out of competition;

T)J n=-\ ii- GOTO STEP (4)

7) Learning;

!C = M O A < ) + 0 - ^

8)If(tPt)then; i- t=t+l ii- GOTO STEP (2)

9) Training has been done. The network is ready for categorization.

2.10 Categorization At the end of the training phase, all weights are fixed. The number of category node C is known. The network is ready for categorization.

l)Input: O

a \-a)n

for

\iM

for M +

\i2M

2) Compute the choice valúes for all committed nodes; 2M

ZUWAW,) T¡»=-*—r,— 2M

,y=l,..., C

3) Determine the node J, which has the máximum choice function among all committed category nodes; (Oí Ty = max{T}'} ;/=!,..., C

35

Match testing: If (the match valué for the winning node J p) then; Category node J represents the category of this input Else The network fails to categorize this input

5) If more categorization is needed GOTO STEP (1). 6) Categorization has been done.

CHAPTERIII FUZZY ARTMAP ANN 3.1 Introduction While the roots of ART ANNs go back to 1976, the supervisión was not started until sixteen years later when ARTMAP. architecture has been constructed (Carpenter et al. 1991b). More supervised architectures have been constructed thereañer (Fuzzy ARTMAP, ART-EMAP,

Gaussian

ARTMAP,

ARTMAP-IC,

and

Distributed

ARTMAP). All of them are constructed from two modules of ART ANNs linked by a map fíeld. Supervisión of ART ANNs using map field approach is shown in (figure 31). The supervisión approach, using map field, will be explained through Fuzzy ARTMAP.

3.2 Fuzzy ARTMAP The Fuzzy ARTMAP is a supervised ART-type ANN. It is a generalization of the ARTMAP ANN. "ARTMAP system learns orders of magnitude more quickly, effectively, and accurately than alternative algorithms. It achieves these properties by using an internal controller that conjointly maximizes predictive generalization and minimizes predictive error by linking predictive success to category size on trail-by-trail bases, using only local operations. This computation increases the vigilance parameter p, of ART a by the minimum amount needed to correct predictive error at ART é " (Carpenter et al. 1991b). Therefore, ARTMAP is a self-organizing expert system, since it calibrates the selectivity of its hypotheses based upon predictive success (Carpenter et

37

b(TRAINING)

MAP F I E L D / - ^ GAIN ( A fc CONTROLA. J *

— ( \ M k ? FIELD ~H JORIENTING VySUBSYSTEM

Figure3-1: Block diagram shows supervisión through map field. Two modulus of ARTs inter-linked by a map field.

38

al. 1991b). While ARTMAP treats binary input only, the Fuzzy ARTMAP is capable of learning and classification of both binary and analog input patterns that present in arbitrary order to the network. While back propagation ANN required 20 000 epochs to learn a benchmark (Lang & Withbrok 1989), Fuzzy ARTMAP required only 5 epochs (Carpenter et al. 1992). The architectures of all supervised ART-type ANNs consist of two modules of ARTs (ARTa and ARTb). These two architectures are linked together through a map field, see (figure 3-1&2). For classification tasks, ART é is reduced to the input layer only, see (figure 3-3). Map field is simply aniVxZ, array of binary weights wjk; j - 1 , ..., N; k-1, ..., L initially set to one, see (figure 3-4). When wJK =0 means the category node ./represents other class than K. Therefore, the node J should be shut off.

3.2.1 Vigilance parameter dynamics in supervised environment As it has been mentioned before, the vigilance parameter p e [0, 1 ] is the key feature for ART ANNs. It represents the minimum match valué required for a committed category node to represent the current input. A match valué of 1 represents perfect representation, while a match valué of 0 represents no match at all. A high valué leads to genérate many category nodes to represent fine subclasses, while a low valué leads to fewer category nodes with coarser subclasses. If the match valué of the winning category node is greater than the predetermined vigilance parameter p while class matching is failed, then the current match valué is assigned to the vigilance parameter after increase it by a very small valué s as it is shown in equation 3-1;

i

2M

P = ~77^Ait)

AW

u)

+ £

3-1

M tí

39

ARTMAP

Figure 3-2: The full architecture for supervisión through mapfield. ARTMAP is shown here. The dynamics of the network is very complex. All supervised ART-type ANNs that have been reported in the literatures (Fuzzy ARTMAP, ART-EMAP, Gaussian ARTMAP, ARTMAP-IC, and Distributed ARTMAP) are constructed using the map field approach. Carpenter and her group they represent all modules of ART as three layers: Input layer F0, category layer F2 and the hypothetical layer Fx. The assumed layer Fl represents the membership x between input and the weights of the winning category node J.

40

MAPHELD

ARTMAP

fab .ab

wJk

AFtt\

V

Pab)RESET

PREOCTNE ERROR

R =1

&,;& = !...,£

T

F

l"

RES3NANCE

F

o

n. a

Binary digital code

®F¿

MATCH TRACKNG

aa'

Figure 3-3: The architecture of ARTMAP for classifícation problem. Match tracking is done through map field. If Y(60 A wJk)pab) match tracking should be conducted. This approach requires map field weights, mapfieldvigilance parameter, and binary digital for class code.

41

Figure3-4: Architecture of Fuzzy ARTMAP. The full structure of ARTA is not needed. It has diminished to input layer only. All components in the upper light box are belonging to the supervisión through map fíeld. The components out side the box is the original architecture of Fuzzy ART.

42

The vigilance parameter p is only increasing during training phase when class matching of the wining category node is failed for a specific input features. The very small valué e is added to the failed match valué and then is assigned to the vigilance parameter in order to classify rare events (Carpenter et al. 1992). The vigilance dynamic in supervised environment is shown in (figure 3-5). If class matching has occurred, all weights of the winning node should be trained. Otherwise, a valué of-1 is assigned to the choice valué of this category node to put it out of competition during current input (shut off). In addition to class matching, the next winning node should beat the new vigilance parameter in order to represent the current input. The vigilance parameter is fíxed if the category node J i s failed to pass the match testing, while the match valué of node J is assigned to p if J is failed the class matching;

i

2M

p"ew =max{pold ^(^(A^

AWU))}±S

3-2

This step is repeated until either one of the committed category nodes can represent the current input or a new category node should be committed. The vigilance parameter reset to its base-line valué and all committed category nodes are reactivated before a new input is presented to the network.

3.2.2 Training phase During training phase, a stream of input vectors A(t) and a stream of input vectors b(t) are presented simultaneously to Fuzzy ART a and Fuzzy ART b of Fuzzy ARTMAP, respectively. The input vector A(t) is normalized to [0, 1] before their

43

i

p

A

i s_

F

P

A

1

1

1 fc T,

Figure3-5: Sketches shows the match tracking. The x-axis represent ranking for all committed category nodes according to their choice valúes Tj. The y-axis represents the match valué for each category nodes. The thin line that runs along the solid line represents the match valué before addingf. New node must be committed, because all committed category nodes can not represents the current input.

44

presentation. The input vector b(,) is the correct prediction given A(0. It is a binary digital code. It has the number of digits (neurons) that equals the number of classes of the input data. In winner-take-all mode, all digits equal to zero except one digit, which corresponds to the order of the class code. This digit is equal to one. However, in class fuzzy membership the sum of the input valúes at the input layer of Fuzzy ARTA is equal to one, that is;

2r=i k=\

If the match valué of the winning node Tj is greater than the vigilance parameter pa, the class matching should be tested. Thus;

k=\

Class matching: weights should be updated for node J;

W = A 4 ° A wf) + (1 - P)ytf * = M°

AO

;i=l, ..., 2M

+ (1 - P)wf -MI .... Z

Else Match tracking: 1

LM 1M

M ti Where pab e (0, 1] is the map field vigilance parameter, wJk is the weight vector which connect the winning node J in the category layer with all nodes of the map field. All weight valúes are initially set to 1. (i.e. wJk= 1 ; j = l, ...,Nand k=\,...,

L), where L is

45

the total number of nodes at the map fíeld, which is equal to the number of classes of the input data. If class match failed (the class matching valué is less than the predetermined map fíeld vigilance parameter pab), the vigilance parameter pa should be increased just abo ve the match valué of the selected category node by a small valué e. This is called match tracking. It is an internal control mechanism that maximizes code compressions and minimizes predictive errors. However, the vigilance parameter of the map fíeld pab is fixed during learning phase. When a winning committed category node failed to pass either, the required confídence level or class matching, it shuts off for the duration of the input. The network repeats doing this until either one of the committed category nodes can represent the current input or a new category node should be committed. If a new category node should be committed (all committed category nodes failed to represent the current input), it weights will be updated as follows for normal learning case (/? 1);

wf¿rsl = pAf + (1 - P ) ; i=l, ..., 2M

w»r=fib¡í+(l-fi);b=l,..,L

3-4

3-5

For fast learning case (J3 = 1), the valúes of the current input Af0 will be assigned to w£rsl and the valúes of b(l) will be assigned to w£™. Fast learning, for newly committed category node, is recommended to classify rare events. The valué of j3 depends on the amount and type of the data under consideration. It is clear from the above formulas that if /? =0 no learning will occur, since weights will not be changed being fixed at 1 during training.

46

More details about Fuzzy ARTMAP architecture and algorithm can be found in (Carpenter et al. 1992, & 1997).

3.2.3 Classification phase At the end of the training phase the weights wtJ and wjk are fixed. The network is ready for classification. Input patterns are presented to ART a without class code. The choice valué is computed for all committed category nodes. The category node with the máximum choice valué is determined. The score of the winning category node J at each node at the input of ART 6 is computed by;

b(0=

™A k ~ L

, k=l, ..., L

3-6

,wjk k=\

The node with the máximum score bKat input layer of ART d is determined. The index iTof this node will be the class code of the current input. i

3.3 Full algorithm of Fuzzy ARTMAP The full algorithm of Fuzzy ARTMAP is listed below. The supervisión of Compact Fuzzy ART, which has been developed in this work, will be used. The Fuzzy ARTMAP algorithm, which used original Fuzzy ART, is listed in (Carpenter et al. 1997).

47

3.3.1 Training algorithms ofFuzzy ARTMAP 1) Input parameters; a) Dynamic parameters; i- pe [O, 1]: Base-line vigilance parameter. ii- pab e (O, 1]: Map field vigilance parameter. iii- J3 e (O,1]: The dynamic leaming parameter; /?=1 for fast leaming. iv- CC 0: The choice valué parameter.

b) Data characteristics; i- M: The dimensión of the input features. ii- Pt: The number of exemplars to be used in leaming.

c) Initialization; i- Number of iterations t=l. ii- Number of committed category nodes C=\.

2) New input;

() = í «í 0 [1-flf

for \iM

0

for M +

li2M\

H° ;k=l,...,L. 3) Compute the choice function for all committed category nodes; 2M

ZU e , A W f ) r ! 7

0

=

w

. 2M

J

l

•••

^

;=1

48

4) Reset: Determine the node J, which has the máximum choice valué;

T^ =max{TP)

;j=l,...,C

5) If ( Tj° - -1: all committed category nodes are in shut-off mode) then a new node (the node that its order in the category layer is C+1) should be committed; i- Increase the number of committed nodes by one; C=C+1 ii- If in fast-commit slow-record mode; Assign the valúes of the input feature to the weights of this node;

™íc" = 4° ; 1=1. .... 2M

« " = ¿ f ;*.;,..„ L Else (normal mode)

w*"=fr}')+(L-fi)

;i-l

2M

wgr=flbi°+\-0) ,W L iii- GOTO STEP (2)

2M

6) Matching criterion: If ( ^ ( 4 - ° AWU) Mp) then; ;=i

i- Shut-off this node to put it out of competition;

ii- GOTO STEP (4)

49

7) Class matching: If ( 2 $ °

A w

Jk) A*)

then

;

i- Shutoffnode J; rw =

_!

ii- Rise p to the limit that deactivates node J; 1M

1

iii- G0T0STEP(4)

8) Learning;

WT = 0(A¡* A W°Jd) + (1 - /?) W ; lW , .... 2M

w? = M° A ) + (1 - / ? ) ;k=l,..., L 9)If(tPt)then; i- t=t+l ii- p = p iii- G0T0STEP(2)

10) Training has been done. The network is ready for classification.

3.3.2 Classification algorithm ofFuzzy ARTMAP 1) Newinput; Ait)

=

, a? l-a¡°

for for M +

\iM \i2M

50

2) Compute the choice function for all committed category nodes; 2M

1=1

3) Determine the node J, which has the máximum choice valué; T}1) = max{Tp}

; 7 W,.... C

4) Matching criterion: 2M

If(£(4 ( ' ) A W ¿ / )Mp)then; i=i

i-The network can not determine the class code of this input. ii- GOTO STEP (7). 5) Class matching: 1,(0 _ k

°

W

Jk

~ L

~~~

;k=l,...,L

W

, Jk

If(S^°AM;-/*)^)then i- The network can not determine the class of this input. ii- GOTO STEP 7.

6) Class assigning:

i. bf=max{b?} ; , . , ! ii- ATis the class code of the current input. 7) If more classification is needed GOTO STEP (1). 8) Classification has been done.

51

CHAPTERIV SUPERVISED ART-IANN 4.1 Introduction As it has been mentioned the map field approach is the unique approach, which has been addressed in the literature for supervisión of all ART-type ANNs. Using map field approach, two modules of ART ANNs are required. Moreover, map field supervisión approach forces one to present the class code as Z-digits long binary code, where L is the number of classes. The binary digital coding is employed by putting all the class code digits equal to zero, except the one, which corresponds to the order of the class code, which is set to 1. The class code should be presented to the network as follows: class code #1 as (1 0 0 . . . 0), class code #2 as (0 1 0 .. . 0), class code #3 as (0 0 1 . . . 0), ..., class code #L as (0 0 0 . . . 1). More than that, training with hard samples (each training exemplar represent a single class, which is the normal case) requires, additional (false) learning for the weights of the map field. False learning because the initial valúes of all weights that connect the category node with the map field equal one. During training all weights drop to zero except one, which its valué remains equal to one. This is the weight that connects the category node with the correspond node of the map field that represents its class, see (figure 4-1). So, the valué of the map field vigilance parameter/?ai does not effect the effíciency of learning or the accuracy of classification. Therefore, pab needs just be a positive fractional number (0, 1], because the match valué at the map field is either equal to one or zero. So, the valué of pab does

52

The initial map field weights The class code bk for class #4 Each map field node bk A WM L

Total map field

1 ¡ 1 0 0 i o i 0 I 1 ¡

1 i 0 I I 0 I

1 1 1 I 0 1

0 a

k=\

Map field weights for class #4 The class code bk for class #2

0 0

0 1

0 0

0 0

1

o I

Each map field node bk A WM

0

fieldX^ k=\

0

0

0 I

L

Total map

0

AW

Jk)

0

I

b

Figure 4-1: The map field supervisión approach: a) A new committed category node, which represents class code #4. The initial map field weights are equal to one. The class code for class #4 is a binary digital code with all digits are equal to zero except digit #4 which is equal to one. The map field weights to the newly committed category node are equal to zero except the fourth one, which is equal to one. So, the code of class #4 has been stamped at the map field weights for this newly committed category node to represent class #4. b) A committed category node represents class code #4 rejected to represent class #2.

53

not effect the class matching process. Class matching using map fields approach is done as follow;

J^ y (bk ¡~^

A

1

for class

matching

0

for class

correction

4-1

w^) =<

In addition to the requirement of two modules of ART, map field approach leads to more computation through map process and weight learning. Moreover, it requires more memory. This chapter describes a new simplified supervised ANN architecture, which is constructed from a single module of ART, called Supervised ART-I (Al-Rawi 1999). Supervisión of the new simplified versión of Fuzzy ART (Compact Fuzzy ART), which has been developed in chapter II, is described here. This new ANN has a simple architecture and thus simplifies the computational complexity, keeping its accuracy performance at the same level of Fuzzy ARTMAP. In addition to that, it has fewer parameters and requires less memory. Moreover, if hardware is developed, the cost will be much lower than that of map field approach. The layout of this chapter is as follows: Section 4.2 describes the architecture of the Supervised ART-I, data representation, training phase, and testing phase. The full algorithm has been Usted in section 4.3. Section 4.4 includes the discussion.

4.2 Supervised ART-I As Fuzzy ARTMAP, the newly developed ANN Supervised ART-I has the ability of learning and classifying of arbitrary sequence order of binary and analog multi-valued input patterns. It has the same classification accuracy of Fuzzy ARTMAP

54

with a simpler architecture. This leads to the simplificaron of the mathematical construction of the network, reduction in the number of parameters, and reduction in the memory requirement, and fínally, reduction of both training and classifícation time. The supervisión approach of Supervised ART-I is shown in (figure 4-2).

4.2.1 Architecture of Supervised ART-I The Supervised ART-I architecture is constructed from a single Fuzzy ART, instead of two, as in Fuzzy ARTMAP. The full architecture of Supervised ART-I is shown in (figure 4-3). This leads to the elimination of the map field, and therefore, elimination of map field weights and the map field vigilance parameter (pab) of Fuzzy ARTMAP. This has been achieved by two different process: 1) Employing the analog class coding (it is convenient to use positive integer) instead of the necessary binary digital coding in Fuzzy ARTMAP; 2) Introducing a one-dimensional memory, running along all the category nodes N of F2a- field which used to tag each new committed node with the code of the class that belongs to. This N size memory represents just (l/Z-)th of the eliminated memory that is occupied by the weights of the map field wjk. Since, all category nodes are connected to the map field, the total number of weights connected to the map field is NxL, where N is the total capacity of the network. Class matching in Supervised ART-I is much simpler than that in Fuzzy ARTMAP. It does simply by reading the tag-value of the winning category node. When a node is committed during training phase, the class code of the input pattern that forces it to be committed is assigned to the memory of its tag-value. Each committed category node has only one tag-value because each node can represent only one class. However more than one category node can represent the same class. Therefore, each category node can be seen as a representation of subclass of the class, which it belongs to.

55

, Tag(./)=6

r

2 PREDDTWE EHBOfl

R =1

Class code Integer

Figure 4-2: Supervisión dynamic of the tagging approach of Supervised ART-I. Class code is integer. Match tracking is conducted by checking the tag of the winning committed category node J with the class code b. This replaces the complicated map field approach.

56

Fx

K

Tag

O

Tag #2

O O

NoTag Yet

O O

Tag

O

NoTag Yet *

M

Figure 4-3: Architecture of Supervised ART-I. Supervisión is done using the tagging approach. When a node is committed, it is tagged with the class code of the input features that forcé it to be committed.

Tag #1 i

4.2.2 Data Description The multi-valued input patterns A(t can be presented to Supervised ART-I in both binary and analog forms. However, input data should be normalized to [0, 1] before their presentation. Since, sometimes, some of the weight valúes erode to zero, it is recommended to introduce A(t) in the complement form to avoid the category proliferation problem. The class node b(t) is not a binary vector as the case of map field supervisión approach, but a positive integer number which represents the class code of the input patterns A(t), ( b — 1, 2,.... etc ). b = 1 represents class number 1, b - 2 represents class number 2,...., etc.

4.2.3 Training of Supervised ART-I During the training phase, a stream of multi-valued input patterns A(t) and the class code b® are introduced simultaneously to the network. The choice function is computed for each committed node according to equation (2-1). The network selects the committed category node J, which 1) has the máximum choice valué among all the committed category nodes (in F2 - field) and 2) has a match valué greater than or equal to the vigilance parameter p;

2M

^ ( ^ A W ^ M p

4-2

If the tag-value of the winning category node matches the current class code b(t), the node will be trained. Otherwise match tracking should be conducted;

58

If(Tag(J)=¿>

)then

Weights updating;

w- = (3{A? A w°Jd) + (1 - flytf • i=l, ..., 2M Else Match tracking; i

2M

M tí Otherwise, class correction should be conducted by increasing the vigilance parameter p above the match valué of this node by a small valué £ , and another committed node is chosen. This sacrifices the generalization to correct predictive error. Any committed category node that failed to represent the current input must be shut off, as far as this input is on, in order to prevent its reselection. A category node is shut off by assigning a -1 to its choice valué. That is because all category nodes not in shut off mode have a positive choice valué. If the failed category node did not shut off, the network will be in infinitive reselect-fail loop. The network will reselect the same category node, and the node will fail to pass the match criterion. If none of the committed category nodes is able to represent the current input A(t) (all committed category nodes are in shut off mode), a new category node is committed and is tagged immediately with the class code b(t) of the current input pattern. Such action is needed when the máximum choice valué is a valué of shut off node (Tj =-1). In the fast-leaming slow-record option, the valúes of the input features, which forced the category node to be committed, are assigned to its weight valúes. This is to let the network deal better with noisy data, so rare events can

59

be classified. If the network is in normal mode, the weights of the new committed category node will be as that of equation (2-16). As in all supervised ART-type ANNs, the vigilance parameter p should be reset to its base-line valué p, before a new input pattern is introduced to the network.

4.2.4 Classification by Supervised ART-I During the testing phase, only the input pattern A(t) is introduced to the network. The choice fiínction is computed for all committed category nodes. The category node with máximum choice fiínction is determined among all committed category nodes. If the match valué of the winning category node Jpasses the base-line vigilance parameter p, then the tag of the category node J represents the class code of the current input pattern A(t). If not, the network can not determine the class code of the current input.

4.3 Algorithm of Supervised ART-I 4.3.1 Training Algorithm of Supervised ART-I 1) Input parameters; a) Dynamic parameters; i- p e [ 0 , 1]: Base-line vigilance parameter. ii- p e (0,1]: The dynamic learning parameter; /?=1 for fast learning. iii-

a

>0: The choice valué parameter.

60

b) Data characteristics; i- M: The dimensión of the input features. ii- Pt: The number of exemplars to be used in learning. iii- L: The number of classes.

c) Initialization; i- Number of iterations t=l. ii- Number of committed category nodes C-\.

2) New input; A

v \

< |l-a,

for (/)

\ -tí s>

* 1 3 -52 - 3

t3

0 0

w

co "K co o

^ 59 03 o J3 " o

+-•

« °~

4-.

-tí

S o SP S§ co

C M C M O O O O O O O

° ° 5 t:

CM

CO

r>

-

o fc O O N ¡ ? 0 0 0 00 ^

O

0 ( 0 0 0 * 0 CM

r-~ o

CO

o

o

o

i -

lO

^ü

CM

CM CO

o

o

o

o

o

CM

* o

o o>

O)

T"

T - SS r^

"tO)

"5§ ^

CM oo r -

TJ-

o

o

o

o

^

cj

gs

CO o

o

o

o

o

CM

h- CM O CO o

CM

o

o

o

^

o co CO CM

co CO

O)

o

o

o ^

•*

CO r-, ^ o

CM

oo _

in

CO

CO

r-% o CM CO CO O to

^

O

O

CM O

O

h- o> o

o

o * j °y omí ^ o o o o o -M ,

CO

2

Oí C/3 Oí

X

)-( OH

Oí

a> T -

n

o

h-

s

Ol

CO

o

s o

O O CM oí m s

o

o

o

m m

o O

O

O

o

IO

co co

CM CM

o

o

CM

co CM

o

o

o

3 co

«-i fi 13 tí «

O O t í Ci) OÍ . 3 Oí

00

±¿ "tí a,

T-cMco^rif)cor--coo>

CM

CO

*¡

.tí

13 •*

OÍ . t í

t>

^

Oí

J

^

Oí

o ^

+tí

4_»

.r-t

¡e

-o 3 ° tí 8 -

Ii5

52 a B t3

§.!> Oí

13 O 03 01

°

J3 tí

H

Oí

"^

CO )-i

^H Oí

tí

co

¡I o Oí C/3

2 o, Oí k-l co

Oí

co co

—^

. 3 x>

o T3 Oí

Oí c3 Oí

o +3

- 03 a T? tí - o Oí Oí «/ o fl O) fl 60 . i ? Oí .sv

«

03 co X co • rt B . H co T3 03 O H 03

97

Forest has the lowest classification accuracy among all classes. Mountains contributed 347, 751, and 328 pixels to natural vegetation-1, natural vegetation-2, and forest, respectively. The behaviour of both Supervised ART-I and Supervised ART-II, for training remotely sensed data, for all the domain of the dynamic parameters is well understood. According to the results that have been obtained, Supervised ART-II should be employed when the number of category nodes is in thousands. Otherwise Supervised ART-I performs better. However, Supervised ART-II can be employed here too, since the learning time is very short when the number of category nodes is less than 1000, which is less than a minute.

98

CHAPTER VII PERFORMANCES OF SUPERVISED ART ANNs WITH DIFFERENT VIGILANCE DYNAMICS

7.1 Introduction Only one approach, for vigilance dynamic in supervised ART ANNs, has been áddressed in the literatures. If the match valué of the winning category node is greater than the predetermined vigilance parameter p while class matching is failed, then the current match valué is assigned to the vigilance parameter after increase it by a very small valué e (equation 3-1) The vigilance parameter p is only increasing during training phase when class matching of the wining category node is failed for a specific input features. The very small positive valué e is added to the failed match valué then is assigned to the vigilance parameter in order to classify rare events (Carpenter et al. 1991b). However, Carpenter et al. (1998b) reported that reducing it by s rather than increasing leads to reduction in number of category nodes without influence the classification accuracy of the network. The vigilance dynamic of this approach is shown in figure 3-5.

7.2. Vigilance dynamics 7.2.1 Flying approach It has been mentioned above the unique vigilance dynamic that reported in the literatures, which is only increasing during training phase when class matching of the wining category node is failed for a specific input features. This approach is called the 99

flying approach to differentiate it from other approaches that they have been proposed in this work. The vigilance parameter in the flying approach is controlled by the foliowing equation: 2M

pt+1 =max{pn{YJ(AiAwUK)IM)í}±£

1A

¡=i

The flying approach prevents committed category node that has a match valué greater than the initial vigilance parameter and belong to the class of the current input out of competition, if the match valué of the failed category node is higher than the match valué of this category node (see figure3-5). This leads to genérate more committed category nodes. Therefore, longer training and classification times are required.

7.2.2 Fixed vigilance approach In this approach, the vigilance parameter is constant during training phase, which has the initial valué. A+i = A

7-2

This allowed all committed category nodes to be created under the same level of confidence. Moreover, committed category node that has a match valué greater than the initial vigilance parameter and belong to the class of the current input can represent the input, independently to its choice valué rank among committed category nodes (see figure 7-la)

7.2.3 Free vigilance approach Free vigilance approach is assigned to the vigilance parameter the match valué of the previous category node if it is failed to represent the current input.

100

a- Fixed

A Ak

ii

•

ái

í i k

b- Free

i

i l

A k

J

A

"

i

c- Float i

l

A

k

i L

á

iL

í Figure7-1: Sketches show different vigilance parameter dynamics. The x-axis represent ranking for all committed nodes according to their choice valúes. The y-axis represents the match valué for each category nodes. First sketch (a) represents the fixed approach. All category nodes are committed at the same level of vigilance valué. The second sketch (b) represents the free approach. In this approach, the vigilance parameter is always equal to the previous match valué. Therefore, a category node might be committed with match valué less then the initial vigilance parameter p0. Finally, the third sketch (c) represents the float approach, the vigilance parameter is equal to the previous match valué if it is not smaller than the initial vigilance valué, otherwise initial vigilance valué will be employed.

101

2M

Pt+l=(Z(AiAWuK)/M)t

7-3

i=i

This allows the vigilance parameter to changed freely above and below the initial valué. This allows the network to attenuate itself to the (proper) vigilance parameter during training phase rather than forcé it to do so (see figure 7-Ib).

7.2.4 Floating approach The floating approach is like that of the free vigilance parameter but with constrain that does not let the vigilance parameter to be lower than its initial valué. This is to be sure that all committed category nodes have the minimum required level of confídence.

pt+]=max{po,(YJ(AiAwUK)/M)t}

7.4

/=i

This leads to genérate category nodes more than both fixed and free approaches, but less than flying approach (see figure 7-le). It should be mentioned here that; px = p0 for all the above vigilance dynamics. Where pB is the initial vigilance valué.

7.3 Results and discussion The performance of supervised ART-II ANN has been tested, for classification of the Landsat TM images, using all the above mentioned vigilance dynamics.

102

0.50 0^-4

0.98 —035— r~~^gy~ ^Ü70~~ 0.00

#cn

1241 ^350 "7740 ~—235' 76.84 227

mm

87.05

%

86.04 g= reT 78^ 7g^

%

Float

56

^

1241 ~T82.

#cn

-77^ T^gg

^ ^

66.71

%

Free %

#cn

120 85.73 1214 ^B5 —jgjg 57 ^™7Z07 . .. ^ j ""56" 71.44 U 13*

#cn

Fixed

Table 7-1: The performance of Supervised ART-II ANN with different vigilance dynamics. Alpha station 500 has been used for these runs.

0.15

nns^

p

p

Fly

Classifícatíon performance and number of ca tegory nodes

Figure 7-2: Classified images for landsat TM images. First, second, third, and forth column represents classified images using fly, float, fixed, and free vigilance parameter, respectively. First, second, third, fourth, and fifth raw represents classified images using initial vigilance parameter and dynamic learning rate of (0.98, 0.50), (0.95, 0.20), (0.90, 0.15), (0.70, 0.15), (0.00, 0.15), respectively. Classes are assigned colours as follow: 1) meadow-white, 2) mountain-brown, 3) fallow landl-yellow, 4) fallow land2-dark yellow, 5) fallow land3-bright yellow, 6) irrigated land-red, 7) alfalfa-black, 8) wetlanddark blue, 9) forest-dark green, 10) wheat-light green, 11) natural vegetationl-yellowish green, 12) natural vegetation2-green, and 13) riverblue.

The network has been tested using fíve different combinations; (0.98,0.50), (0.95,0.20), (0.90,0.15), (0.70,0.15), and (0.00,0.15), for vigilance parameter and dynamic leaming rate, respectively. These valúes of vigilance parameter p0 and dynamic leaming rate j3 are located on the optimum line for classifícation performance (figure 6-6). This optimum line represents the best valué of /? for a specific valué of p to get the máximum classifícation accuracy using flying approach. The classifícation performance is ranged from 66.71% using the free approach to 87.05% using the flying approach. The numbers of category nodes was 120 and 1252 for free and flying approaches, respectively. These results obtained when 0.98, and 0.50 were used for the vigilance parameter and for the dynamic leaming rate, respectively. See (table 7-1) for details. Classified images are shown in (figure 7-2). The neural network performances using flying, floating and fixed approaches are closer to each other as the vigilance parameter approach to unity. It is clear from the theory that all above-mentioned approaches lead to the same classifícation accuracy and number of category nodes at p0 = 1. The neural network performance using floating and free approaches are closer to each other as pB approach to zero. It is lead to the same performance at p0 = 0. While the flying approach shows better performance from accuracy point of view when the initial vigilance parameter is equal or greater than 0.95, the floating approach shows better performance for initial vigilance parameter less than 0.95. From number of category-nodes point of view, the network performs better using floating approach. While it is equal to each other at p0 = 1, it is reduced to less than 25% (56/227) at p0 = 0, (see table 7-1). Such reduction will let to reduction in the training time and the classifícation time as well.

105

CHAPTER VIII CONCLUSIONS In this study new simplifíed architectures of ANNs have been designed. These architectures have been employed to analyze remotely sensed data. The conclusions that can be drawn from this study are: 1) Two new versions of Fuzzy ART have been developed. The algorithms show that these new versions have the same performance as the original algorithm for categorization. However, they require less training and categorization times. 2) New supervised ART-type architecture has been developed called Supervised ARTI. It has been built from a single module of ART rather than two modules of ARTs linked by a map field as are the cases of all supervised ART ANNs which have been addressed in the literature. This leads to the elimination of the map field and its parameters. It is theoretically proven that Supervised ART-I has the classification performance of fuzzy ARTMAP, however, it requires less memory and less training time due to its simple architecture. 3) Other supervised ART-type architecture has been developed called Supervised ART-II. It is also has been built from a single module of ART. It has the classification performance of fuzzy ARTMAP and Supervised ART-I. The category layer of Supervised ART-II has been divided into stacks. Each stack represents a single class. This reduces the required memory for labeling category nodes from N in the tagging approach of Supervised ART-I to only L in the stacking approach of Supervised ART-II. 106

4) An uncommitted category node in Supervised ART-I is free to represent any class, however, an uncommitted category node in Supervised ART-II is predetermined to represent a specific class. When a stack runs out of uncommitted category nodes, borrowing uncommitted category node from other stack is not possible. Increasing the memory size of each stack can solve this limitation of the stacking supervisión approach, of Supervised ART-II. This additional memory is compensated by employing only L of the released N memory of the tagging supervisión of Supervised ART-I. The released memory can be used to increase the memory size of each stack by one fold. 5) While we only employed the newly developed supervisión approaches for Fuzzy ART, they can be applied to all ART-type ANNs. 6) Supervised ART-I is oriented to homogenous environment, while Supervised ARTII is oriented to non-homogenous environment. The homogenousity of the environment depends on the type of data and on the dynamic parameters. 7) Since both Supervised ART-I and Supervised ART-II have been built from a single module of ART, the cost for building chips for classification tasks will be much lower than the map field approach. 8) The behavior of both Supervised ART-I and Supervised ART-II, for training remotely sensed data, for all the domain of the dynamic parameters is well understood. 9) An automatic system for classifying Landsat TM images, with very good classification accuracy, has been developed. 10) This study shows that flying approach should be employed for vigilance dynamic if the vigilance parameter is very high (>0.95), while floating approach should be employed otherwise.

107

Some aspects derived from this study that need to be investigated in fiíture works are: new learning algorithms need to be developed. These learning algorithms must eliminate or reduce the under-training and over-training episode. Further studies are recommended to investígate the behavior of these designed architectures for dealing v/ith different digital signal processing problems. Some studies in this direction have been already conducted. The developed architectures have been employed successfully •for monitoring forest fire (Al-Rawi et al. 2001a, b, c & d) and for cloud detection (AlRawi et al. 200le & f).

108

BIBLIOGRAPHY Al-Rawi, K. R., 1999, "Supervised ART-I: A new neural network architecture for learning and classifying multivalued input patterns", Lecture Notes in Computer Science, 1606,756-765. Al-Rawi, K. R., Gonzalo, C , and Arquero, A., 1999, "Supervised ART-II: A new neural network architecture, with quicker learning algorithm, for classifying multivalued input patterns", In proceeding of the European Symposium on Artificial Neural Network ESANN'99, Bruges, Belgium, 289-294. Al-Rawi, K. R., Gonzalo, C.,and Martínez, E., 2000, "Supervised ART-II for classifícation Landsat Thamatic Mapper image", Remote Sensing in the 21st Century: Economic and Environmental Applications, Casanova (ed), Balkema, Rotterdam, 229-235. Al-Rawi, K. R., Casanova, J. L., and Calle, A., 2001a, "Burned área mapping system and fire detection system, based on neural networks and NOAA-AVHRR imagery", International Journal of Remote Sensing (in press). Al-Rawi, K. R., Casanova, J. L., and Romo, A., 2001b, "IFEMS: New approach for monitoring wildfire evolution with NOAA-AVHRR imagery", International Journal of Remote Sensing (in press). Al-Rawi, K. R., Casanova J. L., and Louakfaoui, M., 2001c, "IFEMS for monitoring spatial-temporal behaviour of múltiple fire phenomena", International Journal of Remote Sensing (in press). Al-Rawi, K. R., Casanova J. L., and Calle, A., 200Id, "ART neural network for mapping burned área and determination severity of burn with Landsat TM images", Submitted to IEEE on Geoscience and Remote Sensing. Al-Rawi, K. R., Casanova J. L., and Vasileisky, A., 200le, "A very quick neural network algorithm for cloud detection", Submitted to Geocarto International. Al-Rawi, K. R., and Casanova, J. L., 2001f, "Neural network as an aid tool for building non-linear threshold algorithm for cloud detection", Submitted to Remote Sensing ofEnvironment. Bachelder, I. A., Waxman, A. M., and Seibert, M., 1993, "A neural system for mobile robot visual place learning and recognition", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 512-517.

109

Baloch, A. A., and Waxman, A. M., 1991, "Visual learning, adaptive expectations, and behavioral conditioning of the mobile robot MAVIN", Neural Networks, 4, 271-302. Baraldi, A., and Parmiggiani, F., 1995, "A neural network for unsupervised categorization for multivalued input patterns. An application to satellite image clustering", IEEE Transaction on Geoscience and Remote Sensing, 33, 305316. Benediktsson, J. A., Swain, P. H., and Ersoy, O. K., 1990, "Neural network approaches versus statistical methods in classification of multisource remote sensing data", IEEE Transaction on Geoscience and Remote Sensing, 28, 540-552. Bernardon, A. M., and Carrick, J. E., 1995, "A neural system for automatic target learning and recognition applied to bare and camouflaged SAR target", Neural Amorfo, 8, 1103-1108. Carpenter, G. A., and Grossberg, S. 1987a, "A massively parallel architecture for a selforganizing neural pattern recognition machine", Computer Vision, Graphic, and Image Processing, 37, 54-115. Carpenter, G. A., and Grossburg, S., 1987b, " ART2: Stable self-organization of pattern recognition codes for analog input patterns", Applied Optics, 26, 4919-4930. Carpenter, G. A., and Grossberg, S., 1990, " ART3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures", Neural Networks, 3, 129-159. Carpenter, G. A., Grossberg, S., and Rosen, D. B., 1991a, " Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system", Neural Networks, 4, 759-771. Carpenter, G. A., Grossberg, S., and Renold, J. H., 1991b, "ARTMAP: Supervised realtime learning and classification of nonstationary data by self-organizing neural network", Neural Network, 4, 565-588. Carpenter, G. A., Grossberg, S., Markuzan, N., Reynold, J. H., and Rosen, D. B., 1992, "FUZZY ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps", IEEE Transaction On Neural Network, 3,698-882. Carpenter, G. A., and Ross, W. D., 1993, "ART-EMAP: a new neural network architecture for object recognition by evidence accumulation", IEEE Transaction On Neural Network, 6, 805-818. Carpenter, G. A., Gaja, M. N., Gapa, S., and Woodcok, C. E., 1997, "ART neural networks for remote sensing: vegetation classification from landsat TM and terrain data", IEEE Transaction on Geoscience and. Remote Sensing, 35, 308325.

no

Carpenter, G. A., 1997, "Distributed learning, recognition, and prediction by ART and ARTMAP neural networks", Neural Networks, 10, 1473-1494. Carpenter, G. A., and Markuzon, N., 1998, "ARTMAP-IC and medical diagnosis: Instance counting and inconsistent cases", Neural Networks, 11, 323-336. Carpenter, G. A., 1998, "Distributed ARTMAP: a neural network for fast distributed supervised learning", Neural Networks, 11,793-813. Caudell, T. P., and Healy, M. J., 1994, "Adaptive Resonance Theory networks in the Encephalon autonomous visión system", Proceedings of the IEEE International Conference on Neural Networks, Piscataway, NJ, USA, 12351240. Caudell, T. P., Smith, S. D. G., Escobedo, R., and Anderson, M., 1994, "NIRS: large scale ART-1 neural architectures for engineering design retrieval", Neural Network, 7,1339-1350. Dubrawski, A., and Crowley, J. L., 1994, "Learning locomotion reflexes: A selfsupervised neural system for a mobile robot", Robotic and Autonomous System, 12,133-142. Gan, K. W., and Lúa, K. T., 1992, "Chínese character classification using adaptive resonance network", Pattern Recognition, 25, 877-882. Georgiopoulos, M, Fernlund, H., Bebis, G., and Heileman, G. L., 1996, "Order of search in Fuzzy ART and Fuzzy ARTMAP: Effect of the choice parameter", Neural networks, 9, 1541-1559. Georgiopoulos, M, dagher, L., Heileman, G. L., and Bebis, G., 1999, "Properties of learning of a Fuzzy ART variant", Neural networks, 12, 837-850. Gopal, S., Sklarew, D. M., and Lambin, E., 1994, "Fuzzy-neural networks in multitemporal classification of landcover change in the Sahel", Proceeding of the DOSES Workshop on New Toolsfor Spatial Analysis, Lisbon, Portugal, 55-68. Grossberg, S., 1976, " Adaptive pattern classification and universal recoding, II: Feed back, expectation, olfaction, and illusions", Biological Cybernetics, 23, 187202. Grossberg, S, 1980, "How does a brain build a cognitive code?", Psychological Review, 1,1-51. Ham, F. M., and Han, S. W., 1996, "Quantitative study of the QRS complex using fuzzy ARTMAP and MIT/BIH arrhythmia datábase", in proceeding of Word congress on Neural Networks, 1,207-211. Heermann, P. D., and Khazenie, N., 1992, "Classification of multispectral remote sensing data using a Back-Propagation neural network", IEEE Transaction on Geoscience and Remote Sensing, 30, 81-88.

lll

Hepner, G. F., Logan,T., Ritter, N., and Bryant, N., 1990, " Artificial neural network classification using a minimal training set: comparison to conventional supervised classification", Photogrammetric Engineering & Remote Sensing, 56, 469-473. Hopfield, J. J., 1982, "Neural networks and physical systems with emergent collective computational abilities," Proceeding of National academy of Sciences, 79, 2554-2558. Keyvan, S., Drug, A., Rabelo, L. C , 1993, "Application of artificial neural networks for development of diagnostic monitoring system in nuclear plants", transaction of American Nuclear society, 1, 515-522. Keyvan, S, 1999, "Application of ART2-A as a Pseudo-supervised paradígn to nuclear reactor diagnostics", Lecture Notes in Computer Science, 1606, 747-755. Kohonen, T, 1982, "Self-organized formation of topologically correct feature maps," Biological Cybernetics, 43, 59-69. Kumar, S. S., and Guez, A., 1989, "A neural network approach to target recognition", International Joint Conference on Neural Network, Washington DC, Hillsdale, NJ, Erlbaum Associate, II, 573-578. Lang, K. J., and Withbrock, M. J., 1989, "Learning to tell two spirals apart", Proceedings 1988 Connectionist Models Summer School, 52-59. Le Cun, Y. 1986, "Learning processes in an asymmetric threshold network", in Disordered Systems and Biological Organization, E. Bienenstock, F. Fogelman Souli, and G. Weisbruch, Eds., Berlín, Spring-Verlag. Mannan, B., Roy, J., and Ray, K., 1998, "Fuzzy ARTMAP supervised classification of multi-spectral remotelt-sensed data", International Journal of Remote Sensing, 19, 767-774. Mehta, B. V., Vij, L., and Rabelo, L. C., 1993, "Prediction of secondary structure of protein using fuzzy ARTMAP", in proceeding of Word Congress on Neural Networks, 1,228-232. Mekkaoui, A., and Jespers, P., 1990, "An optimal self-organizing pattern classifier", International Joint Conference on Neural Networks, Washington DC, Hillsdale, NJ, Erlbaum Associate, 1,477-450. Moore, B., 1989, "ART1 and patterns clustering", proceeding 1988 connectionist models Summer School, D. Touretzky, G. Hintoon, and T. Sejnowski, Eds, San Mateo, CA : Morgan Kaufmann, 174-185. Mulder, N. J., and Spreeuwers, L., 1991, "Neural networks applied to the classification of remotely sensed data", International Geoscience and Remote Sensing Symposium (IGARSS'91). Espo, Finland, 2211-2213.

112

Murrshed, N. A., Bortozzi, F., and Sabourin, R., 1995, "Off-line signature verification, without a priori knowledge of class col. A new approach", Proceedings ofthe Third International Conference on Document Analysis and Recognition, Piscataway, NJ, USA. Paola J. D.,and Schowengerdt, R. A., 1994, " Comparisons of neural networks to standard techniques for image classification and correlation", International Geoscience and Remote Sensing Symposium (IGARSS'94). Pasadena, Ca, USA, 1404-1406. Paola, J. D., and Schowengerdt, R. A., 1995, "A review and analysis of backpropagatíon neural networks for classification of remotely-sensed multi spectral imagery", International Journal of Remote Sensing, 16, 3033-3058. Paola, J. D., and Schowengerdt, R. A., 1997, "The effect of neural-network structure on a multispectral land-use / land-cover classification", Photogrammetric Engineering & Remote Sensing, 63, 535-544. Parker, D., 1986, "Computational research in economics and management science", MIT, Cambridge, MA, USA, technical report TR-87, 1986. Racz, J., and Dubrawski, A.; 1995, "Artificial neural network for mobile robot topological localization", Robotics andAutonomous Systems, 16, 73-80. Rumelhart, D. E., Hinton, G. E., and Williams, R. J., 1986, "Learning internal representations by back-propagation", Parallel distributed Processing: Explorations in the Microstructure of Cognition (D. E. Rumelhart and J. L. McClelland, Eds). MIT Press, Cambridge, Massachusetts, 318-362. Salu, Y., and Tilton, J., 1993, "Classification of multispectral image data by the binary diamond neural network and by nonparametric, pixel-by-pixel methods", IEEE Transaction on Geoscience and Remote Sensing, 31, 606-617. Seibert, M., and Waxman, M., 1992, "Adaptive 3-D object recognition from múltiple views", IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 107-124. Seibert, M., and Waxman, A. M., 1993, "An approach to face recognition using saliency maps and caricatures", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 661-664. Simpson, P. K., 1990, "Neural networks for sonar signal processing", Handbook of neural computing applications (A. J. Maren, C. T. Harston, and R. M. Pap (Eds.), San Diego, Academic press, 319-335. Solaiman, B., and Mouchot, M. C , 1994., "A comparative study of conventional and neural network classification of multispectral data", International Geoscience. and Remote Sensing Symposium (IGARSS'94), Pasadena, CA, USA, 14131415.

113

Soliz, P., and Donohoe, G. W., 1996, "Adaptive resonance theory neural network for fundus image segmentation", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 1180-1183. Srinivasa, N., and Sharma, R., 1996, "A self-organizing invertible map for active visión applications", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 121-124. Tzeng, Y. C , Chen, K. S., Kao, W. L., and Fung, A. K., 1994, "A dynamic learning neural network for remote sensing applications", IEEE Transaction on Geoscience and Remote Sensing, 32,1096-1102. Yool, S. R., 1998, "Land cover classification in rugged áreas using simulated moderateresolution remote sensor data and an artificial neural network", International Journal of Remote Sensing, 19, 85-96. Yoshida, T., and Omatu, S., 1994, "Neural network approach to land cover mapping", IEEE Transaction on Geoscience and Remote Sensing, 32, 1103-1109. Warner, T. A., and Shank, M., 1997, "An evolution of the potential for fuzzy classification of multispectral data using artificial neural networks", Photogrammetic Engineering & Remote Sensing, 63,1285-1294. Waxman, A. M., Seibert, M. R.? Gove, A., Fay, D. A., Bernardon, A. M., Lazott, C , Steele, W. R., and Cunnigham, R. K., 1995, "Neural processing of targets in visible, multispectral IR and SAR imagery", Neural Networks, 8, 1029-1051. Werbos, P. J., 1974, "Beyond regression: New tools for prediction and analysis in the behavioural sciences", Ph.D. thesis, Harvard University, Cambridge, MA, USA. Williamson, J. R., 1996, "Gaussian ARTMAP: A neural network for fast incremental learning of noisy multidimentional maps", Neural Networks, 9, 881-897. Wilson, C. L., Wilkinson, R. A., and Ganis, M. D., 1990, "Self-organizing neural network character recognition on a massively parallel computer", International Joint Conferenceon neural Networks, San Diego, Piscataway, NJ, IEEE Service Center, II, 325-329.

114

APPENDIX RESUMEN

A.l. INTRODUCCIÓN A.1.1 Evolución histórica de las Redes Neuronales Artificiales (RNA) Aún cuando el origen de las RNA se puede fechar en 1943, cuando McCulloch and Pitts construyeron la primera estructura de RNA, los fundamentos de este área se desarrollaron en la primera mitad de los años setenta. Fue entonces cuando (Werbos 1974) planteó los principios del algoritmo de aprendizaje conocido como Back Propagation (BP) y (Grossberg 1976) estableció las bases de la Teoría de Resonancia Adaptativa (Adaptive Resonance Theory (ART)). No obstante, fue en la década de los ochenta cuanto se produjo un gran avance teórico en este campo. De tal forma que el algoritmo BP fue desarrollado simultánea e independientemente por diferente autores (Le Cun 1986, Parker 1986, y Rumelhart et al. 1986). Además se plantearon nuevas estructuras de redes neuronales y nuevos algoritmos de aprendizaje. Así, Kohonen propuso en 1982 las Redes Neuronales Autoorganizativas {Self-Organizing Map (KSOM)). En este trabajo se ha prestado una especial atención a la evolución experimentada por las RNA tipo ART (Carpenter y Grossbergh 1987a&b), dada su probada estabilidad, rapidez y precisión (Carpenter et al. 1991a&b, 1992, 1997, y Gan & Lúa 1992). Estas prestaciones han facilitado su aplicación en diferentes y numerosas áreas. Así, la compañía Boeing ha utilizado este tipo de RNA para la obtención de información de diferentes sistemas con objeto de facilitar el diseño de otros nuevos sistemas (Caudell et al. 1994). También se ha utilizado este tipo de redes para

115

reconocimiento de objetivos móviles (Seibert y Waxman 1992, Bernardon y Carrick 1995, Kumar y Guez 1989, Koch et al. 1995, y Waxman et al. 1995); Para el control de motores en robótica (Baloch y Waxman 1991, Bachelder et al. 1993, Dubrawski y Crowley 1994, Srinivasa y Sharma 1996); En navegación de robots (Racz y Dubrawski 1995); En visión artificial (Caudell y Healy 1994); Reconocimiento de objetos (Seibert y Waxman 1992); Reconocimiento de caras (Siebert y Waxman 1993); Agrupación de patrones (Moore 1989, Mekkaoui y Jespers 1990); Reconocimiento de caracteres (Wilson et al. 1990); Procesado de señales de Sonar (Simpson 1990); Procesado de imágenes médicas

(Soliz y Donohoe

1996); Reconocimiento

de ondas en

electrocardiogramas (Ham y Han 1996); Verificación de firmas (Murshed et al. 1995); Identificación de fallos en plantas nucleares (Keyvan 1999); y en Teledetección (Gopal etal. 1994, Baraldi y Parmiggiani 1995).

A. 1.2 Clasificación de datos remotamente detectados con RNA. Los avances experimentados en las últimas décadas, tanto en la investigación espacial como en las tecnologías de computación, han hecho posible la utilización de datos remotamente detectados para la determinación y ubicación automática de las clases temáticas presentes en la superficie terrestre. En la actualidad, este área de conocimiento se caracteriza por ser una línea de investigación muy activa (Benediktsson et al. 1990). Las ventajas que aporta el uso de RNA para llevar a cabo estas tareas de clasificación, frente a algunos clasificadores convencionales, tales como el de máxima probabilidad (MLC) son: 1) Las RNA no necesitan conocer apriori la distribución de probabilidad para cada clase, ya que son sistemas no-paramétricos. Además, esto permite introducir otros datos auxiliares de naturaleza no espectral (pendiente, topografía, textura, ...etc), los cuales parecen mejorar la precisión de la clasificación

116

(Benediktsson et al. 1990, Carpenter et al. 1997). También, se ha probado que las redes neuronales son más robustas cuando la distribución no es gaussiana (Paola y Schowengerdt 1997,

Hepner et al. 1990). 2) A diferencia de los clasificadores

convencionales, las RNA tiene capacidad para tratar con clasificaciones difusas (Paola y Schowengerdt 1997, Warner y Zanca 1997, Yool 1998). En estos casos, los valores proporcionados por las neuronas de salida pueden cuantificar el grado de pertenencia de los datos de entrada a una clase determinada. Este aspecto es especialmente relevante cuando se trabaja con sensores de baja resolución espacial. 3) El paralelismo inherente en las RNA permite una relativa facilidad de computación de estos sistemas en computadoras paralelas (Salu y Tilton

1993, Heermann y Khazenie

1992),

disminuyendo considerablemente el tiempo empleado en el proceso de clasificación, respecto de los clasificadores clásicos. 4) La flexibilidad de las RNA permite mejorar los resultados de clasificación en determinadas circunstancias (Carpenter et al. 1997). 5) Por último, estos sistemas tienen la capacidad de poder establecer límites de decisión arbitrarios (Paola y Schowengerdt 1995, Tzeng et al. 1994). La red neural mas habitualmente utilizada en la literatura para clasificar datos remotamente detectados es el Perceptron multi-capa (MLP), con el conocido algoritmo de aprendizaje Backpropagation. Este algoritmo se basa en la minimización del error entre el valor proporcionado por la red a su salida y el valor real. Algunos autores han afirmado que los clasificadores convencionales tienen mejores prestaciones que el MLP (Mulder y Spreeuwers 1991, Solaiman y Mouchot 1994). Sin embargo, otros han concluido que el MLP clasifica datos remotamente detectados con mayor precisión que el MLC ( Hepner et al. 1990, Heerman y Khazenie 1992, Paola y Schowengerdt 1994, Yoshida y Omatu 1994). No obstante, la clasificación de datos remotos mediante la red MLP presenta una serie de inconvenientes, como son: la arquitectura de la red no es fija,

117

el número de capas ocultas y el número de nodos en cada capa oculta debe determinarse mediante prueba y error. Este proceso puede ser muy costoso desde el punto de vista de tiempo de computación, dado que el entrenamiento de la red es lento. Además, durante el proceso de aprendizaje, la red puede quedar atrapada en mínimos locales, lo que impediría la convergencia de la red. Este problema se puede minimizar disminuyendo el valor de la razón de aprendizaje, pero esto supone un aumento en el tiempo empleado por la red durante el entrenamiento. (Heermann y Khazenie 1992) propusieron la utilización de computadoras paralelas para reducir el tiempo de entrenamiento, a costa de un aumento en el coste de hardware. Algunos estudios (Carpentar et al. 1992) han mostrado que Fuzzy ARTMAP proporciona una precisión de clasificación mayor que el MLP para imágenes del sensor Thematic Mapper (TM), transportado por el satélite Landsat, empleando menos tiempo para ello. Así mismo, estos autores concluyeron que en este caso, Fuzzy ARTMAP y MLC proporcionaban la misma precisión de clasificación. Sin embargo, (Marinan et al. 1998) compararon las prestaciones de Fuzzy ARTMAP, MLP y MLC para clasificar una imagen de 512x512 detectada por el sensor LISS-II transportado por el satélite Indio IRS-1B, concluyendo que la precisión de clasificación de Fuzzy ARTMAP era muy superior a la de los otros dos clasificadores. En cuanto al tiempo requerido para el aprendizaje era ligeramente inferior que el tiempo empleado por el MLC y considerablemente menor que él empleado por el MLP. Además es preciso destacar que a diferencia del MLP, la arquitectura de Fuzzy ARTMAP está bien definida, siempre converge, y es capaz por si misma de generar nuevos nodos que permitan representar subclases. El principal inconveniente que presenta Fuzzy ARTMAP es la complejidad de su arquitectura.

118

A.2. OBJETIVOS DE LA TESIS De los aspectos discutidos anteriormente se sigue el objetivo de la presente Tesis. Este objetivo se puede enunciar como la búsqueda de arquitecturas de redes neuronales tipo ART que presenten las mismas prestaciones que ellas, pero que sean más simples desde el punto de vista estructural, lo que a su vez supondrá la disminución de los tiempos de cómputo asociados tanto al proceso de aprendizaje como al de operación. Este objetivo global, se puede desglosar en algunos objetivos parciales como son: •

Diseño de nuevas arquitecturas dé RNA tipo ART, que proporcionen la misma precisión de clasificación que las ART clásicas, reduciendo la complejidad de sus arquitecturas.

•

Propuesta de algoritmos de aprendizaje para estas arquitecturas.

•

Codificación de los algoritmos de aprendizaje de las diferentes arquitecturas propuestas.

•

Estudio exhaustivo y comparativo de las prestaciones de las redes y los algoritmos propuestos para el caso de la clasificación de imágenes remotamente detectadas por el sensor Thematic Mapper.

A.3. REDES NEURONALES ARTIFICIALES TIPO ART Los principios de la Teoría de Resonancia Adaptativa (ART) fueron planteados por Carpenter y Grossberg (Centre for Adaptive Systems, Department of Cognitive and Neural System, University of Boston), como una teoría sobre el procesado de información del sistema cognitivo humano (Grossberg 1976, 1980). A partir de esta teoría, se desarrollaron inicialmente, diferentes estructuras no supervisadas, ART1 (Carpenter y Grossberg 1987a), ART2 (Carpenter y Grossberg 1987b), ART3

119

(Carpenter y Grossberg 1990), SART (Baraldi y Parmiggiani 1995) y Fuzzy ART (Carpenter et al. 1991a). Todas estas redes eran capaces de agrupar las diferentes entradas en clases, utilizando únicamente la información que caracterizaba a dichas entradas (aprendizaje no supervisado). La diferencia fundamental entre ART1 y ART2 es que la primera solo admite datos binarios, mientras que la segunda también admite datos analógicos. En ambas, hay flujo de información hacia delante y hacia atrás. Hacia delante, a través de los pesos que conectan cada nodo de la capa de entrada con todos los nodos de la capa que realiza el agrupamiento de los datos de entrada. A cada uno de estos nodos se le va a denominar nodo categoría. Y hacía atrás mediante otro conjunto de pesos que conecta cada nodo categoría, con todos los nodos en la capa de entrada. Al igual que ART2, Fuzzy ART puede clasificar tanto datos binarios como analógicos. Sin embargo, en este último caso la información solo fluye hacia delante desde la capa de entrada hasta la capa clasificadora. Otra diferencia fundamental entre Fuzzy ART, ART1 y ART2, es que el operador intersección de la teoría de conjuntos ( n ) , ha sido sustituido por el operador ( A ) que representa al operador de mínimo valor en la teoría de lógica difusa (fuzzy). La primera red neuronal tipo ART que presentó un aprendizaje supervisado fue ARTMAP, la cual fue propuesta por Carpentar et al. en (1991). En este caso, además de las características a clasificar es preciso proporcionar a la red, durante la fase de entrenamiento, el código de clase que corresponde a cada entrada. En 1992, estos mismos autores presentaron otra nueva red tipo ART con aprendizaje supervisado Fuzzy ARTMAP (Carpenter et al. 1992). Posteriormente, otras muchas arquitecturas supervisadas tipo ART han sido investigadas, entre las que cabe mencionar ART-EMAP (Carpenter y Ross 1993), Gaussian ARTMAP (Williamson 1996), ARTMAP-IC (Carpenter y Markuzon 1998), y Distributed ARTMAP (Carpenter 1998). Todas estas

120

arquitecturas, se caracterizan porque la supervisión se lleva a cabo mediante un "map field" que requiere la presencia de dos módulos tipo ART (ARTa y ARTb). Las principales diferencias entre ARTMAP y Fuzzy ARTMAP radican en que mientras la primera está construida con dos módulos de ART1, la segunda utiliza dos módulos de Fuzzy ART. ARTMAP tiene la habilidad de aprender y clasificar patrones de entrada binarios multievaluados, mientras que Fuzzy ARTMAP también admite patrones analógicos. De todas las redes supervisadas mencionadas anteriormente, Fuzzy ARTMAP ha sido la más utilizada. Ella ha sido aplicada a la resolución de diferentes problemas, como son: análisis automático de electrocardiogramas (Ham y Han 1996); gestión y diagnóstico de centrales nucleares (Keyvan et al. 1993); o predicción de la estructura secundaria de algunas proteínas (Mehta et al. 1993).

A.3.1 Fuzzy ART Dado que todas las arquitecturas y algoritmos propuestos en este trabajo están inspirados en Fuzzy ART, y Fuzzy ARTMAP, se va a realizar aquí una breve descripción de ambas. Previamente, es preciso hacer notar que ambas mantienen las características básicas y propias de todo los sistemas tipo ART. Entre ellas, es especialmente resefiable, el emparejamiento de acuerdo a criterios de semejanza (matching) entre los patrones de entrada y los vectores prototipo previamente aprendidos por la red. Este proceso de emparejamiento puede llevar a la red a un estado resonante que puede dar lugar al aprendizaje de nuevos prototipos (categorías) o a la búsqueda de prototipos semejantes y previamente aprendidos. Si la semejanza es mayor entre el patrón de entrada a la red y el almacenado que el predeterminado, la resonancia ocurre y la nueva información se incorpora al nodo de la categoría seleccionado

121

mediante el entrenamiento de sus pesos. El criterio de semejanza se establece a través del denominado parámetro de vigilancia/?. Este parámetro determina el umbral que debe superar un nodo categoría comprometido para poder representar un patrón de entrada dado, antes de que se dispare la búsqueda de otro nodo categoría que represente mejor dicho patrón. Si ninguno de los nodos categoría comprometidos supera dicho umbral, se debe comprometer un nuevo nodo categoría. Este proceso se puede repetir, siempre que no se supere la capacidad de memoria de la red. El parámetro de vigilancia, p,

es un número adimensional definido en el intervalo (0, 1]. Un valor de este

parámetro igual a 1 representa una semejanza perfecta, es decir determina clases muy bien diferenciadas, pero da lugar a un número alto de nodos categoría, mientras que valores bajos de este parámetro permiten trabajar con pocos nodos categoría pero da lugar a clases muy generales. Este parámetro es una de las claves de todas las RNA tipo ART. Su valor depende del tipo y volumen de datos, la precisión de clasificación que se desee, la velocidad requerida y la memoria disponible. Este parámetro se mantiene constante en la operación de todas las redes no supervisadas. En la figura 2-1 de la memoria, se muestra la dinámica de Fuzzy ART. En esta figura Fx representa la capa de entrada y F2 la denominada capa clasificadora. Los pesos ^conectan cada nodo de la capa de entrada con todos los nodos de la capa clasificadora. El aprendizaje de los pesos del nodo ganador, wu, solo se lleva acabo si este nodo pasa la prueba de semejanza, o dicho en otras palabras supera el parámetro de vigilancia, sino este nodo sale de la competición (reset). En la figura 2-1 \X\ representa el grado de semejanza entre la entrada y los pesos del nodo categoría ganador J. Este grado de semejanza está determinado por la relación X = ^ ( 4 ( , )

A

wu).

La selección

del nodo ganador supone calcular el nivel de activación de cada nodo categoría, Tj°

122

(ec. 2-1), y elegir el nodo que alcanza el nivel mas alto. El valor de y. es una estimación del grado de pertenencia de la entrada a la clase representada por el nodo/. La arquitectura de Fuzzy ART se muestra en la figura 2-2, donde se han representado los 2M nodos de la capa de entrada, siendo M el número de valores que definen a cada patrón de entrada. Los M últimos nodos de entrada representan los valores complementarios de dichos patrones. Además en la figura 2-2 se han representado los nodos categoría, así como todas las conexiones entre los nodos de F¡ y F2 • Los nodos categoría cuyo índice va desde 1 hasta C reciben el nombre de nodos categoría comprometidos, mientras que los nodos categoría cuyos índices van desde C+l hasta N se denominan nodos categoría no comprometidos. Cuando todos los nodos categoría comprometidos fallan en la representación de una entrada y consecuentemente están fuera de competición uno de los nodos categoría no comprometidos debe ser comprometido. Una vez que se ha encontrado un nodo capaz de representar al patrón de entrada a la red y dicho nodo ha pasado el test de vigilancia, el valor de los pesos de ese nodo categoría debe ser actualizados para que incorporen las características del nuevo patrón al nodo J (ec.2-7). La ecuación de adaptación de los pesos viene dada por la siguiente expresión: w"J» = /3{A?

A

wff ) + (1 - / ? ) <

;

i=h ..., 2M

2-7

Donde J3 e (0, 1] es el parámetro denominado razón de aprendizaje (learning raté).

A.3.2 Fuzzy ARTMAP Como ya se ha mencionado, Fuzzy ARTMAP es una generalización de ARTMAP (Carpenter et al. 1991b) (ver figuras 3-ly 3-2). En este caso, el mapfield es

123

un matriz deNxL

pesos binarios (w jk ; j=l, ..., N; k=l, ..., L) inicializados a 1 (figure

3-4), siendo L el número de clases a considerar. A diferencia de las redes tipo ART no supervisadas, en las supervisadas, el parámetro de vigilancia p s[0, 1] puede aumentar durante el proceso de aprendizaje. Así por ejemplo, si el nivel de activación del nodo ganador es mayor que el valor del parámetro de semejanza predeterminado y sin embargo, el test de semejanza a esa clase no es superado, entonces, p toma el valor del nivel de semejanza aumentado en una pequeña cantidad e, como se muestra en la siguiente ecuación: 1

1M ¿M

, . ^

. ,

„ .

3-1

Esta definición del parámetro de vigilancia va a permitir clasificar eventos raros (Carpenter et al. 1992). La figura 3-5 muestra la dinámica del parámetro de vigilancia en entornos supervisados. Para llevar a cabo el entrenamiento de la red Fuzzy ARTMAP, se deben presentar a los módulos ARTa y ARTb los pares formados por los vectores de entrada A(t> y b . El primer conjunto de vectores representa los patrones de entrenamiento, mientras que el segundo grupo representa el código binario asignado a la clase a la que pertenece el correspondiente patrón de entrenamiento. Cuando el nivel de activación del nodo ganador supera el parámetro de vigilancia, se debe evaluar la semejanza de la clase que en este caso se considerará aceptable, si supera el valor predeterminado del parámetro de vigilancia del mapfield, pab, entonces se procede a la actualización de los pesos de acuerdo a las ecuaciones

" T = A4 ( ° A Wf) + (1 - PW?

;i=l 2M