root/matlab/A1Report.txt

Revision 407, 10.2 kB (checked in by tapted, 4 years ago)

Import of some matlab code written during honours

  • Property svn:eol-style set to native
Line 
1                                               Trent Apted / tapted / 0010433
2
3                   COMP4302: Artificial Neural Networks 2003
4
5                                  Assignment 1
6
7   1) Description of the data
8
9   o  how the two data sets were generated
10
11   First a 500 x 2 matrix is created using `rand', to generate x and y values
12   between 0.0 and 1.0 (ie one coordinate on each row).
13
14   Elements in the third column are -1 if the point lies above a line
15   determined by a [linear] function `f' that evenly divides the region
16   [0..1] x [0..1]. Elements are +1 if the point lies below that line. The
17   function used was y = f(x) = 0.5x + 0.25.
18
19   Elements in the fourth column are similarly chosen, but using a curve
20   derived from `f' and sin such that they are not linearly separable, but
21   still roughly evenly split.
22
23   These are then randomly split into a training set and a test set such that
24   one third are in the test set and the remainder are in the training set.
25   (The extra randomness is unnecessary, but is a useful framework where the
26   initial examples are not random).
27
28   Columns 1, 2 and 3 become the linearly separable examples and columns 1, 2
29   and 4 become the non-linearly separable examples.
30
31   o  plot of the datasets
32
33   to follow
34
35   2. Experimental setting
36
37   o  architecture of each neural network (number of input, hidden and output
38   neurons; transfer functions used)
39
40   +------------------------------------------------------------------------+
41   |Network    | Input |Hidden |Output |     Transfer     | Learn  | Train  |
42   |           |Neurons|Neurons|Neurons|   Function(s)    |Function|Function|
43   |-----------+-------+-------+-------+------------------+--------+--------|
44   |Perceptron |   2   |  N/A  |   1   |     hardlims     | learnp | trainc |
45   |-----------+-------+-------+-------+------------------+--------+--------|
46   |ADALINE    |   2   |  N/A  |   1   |     purelin      |learnwh |trainc* |
47   |-----------+-------+-------+-------+------------------+--------+--------|
48   |MLP: linsep|   2   |   2   |   1   |  tansig, tansig  |learngdm|trainrp |
49   |-----------+-------+-------+-------+------------------+--------+--------|
50   |MLP: n-lsep|   2   | 2 - 4 |   1   |  tansig, tansig  |learngdm|trainrp |
51   +------------------------------------------------------------------------+
52
53   * ADALINE did not converge with trainb when there were more than 133
54   training examples [possibly due to a bug]
55
56   o  parameters - learning rate, momentum (if used), stopping criteria
57
58   +------------------------------------------------------------------------+
59   |           |                  |            Stopping Criteria            |
60   |-----------+------------------+-----------------------------------------|
61   |Network    |Learning|Moment-um|  Max.  |MSE |Minimum Performance|Maximum|
62   |           |  Rate  |         | Epochs |Goal|     Gradient      | `Mu'  |
63   |-----------+--------+---------+--------+----+-------------------+-------|
64   |Perceptron |  N/A   |   N/A   |  100   |0.02|        N/A        |  N/A  |
65   |-----------+--------+---------+--------+----+-------------------+-------|
66   |ADALINE    | 0.005  |   N/A   |  100   |0.3 |        N/A        |  N/A  |
67   |-----------+--------+---------+--------+----+-------------------+-------|
68   |MLP: linsep| 0.005  |   0.9   |  500   |0.02|        0.0        |   "   |
69   |-----------+--------+---------+--------+----+-------------------+-------|
70   |MLP: n-lsep| 0.005  |   0.9   |500-5000|0.02|        0.0        |   "   |
71   +------------------------------------------------------------------------+
72
73   3. Results and discussion
74
75   o  include the speed plots, accuracy and mse results for each neural
76   network
77
78   plots to follow
79
80+----------------------------------------------------------------------------------------------------------------------+
81|         |          |       |            |   Training Set   |   Test Set   |                                          |
82|---------+----------+-------+------------+------------------+--------------|                                          |
83| Dataset | Network  |Hidden |   Epoch    | Accuracy |  MSE  |Accuracy| MSE |                                          |
84|         |          |Neurons|  Reached   |          |       |        |     |                                          |
85|---------+----------+-------+------------+----------+-------+--------+-----|                                          |
86|         |Perceptron|  N/A  |     6      |   100%   | 0.000 | 99.4%  |0.024|                                          |
87|         |----------+-------+------------+----------+-------+--------+-----+------+                                   |
88|         |          |ADALINE|    N/A     |    14    | 98.2% | 0.294  |98.2%|0.290 |                                   |
89|         |          |-------+------------+----------+-------+--------+-----+------+-----+                             |
90|         |          |       |MLP         |    2     |  17   |  100%  |0.017|100.0%|0.016|                             |
91|         |          |       |------------+----------+-------+--------+-----+------+-----+-----+                       |
92|Linearly |          |       |            |Perceptron|  N/A  |  100   |74.5%|1.021 |74.3%|1.030|                       |
93|Separable|          |       |            |----------+-------+--------+-----+------+-----+-----+-----+                 |
94|         |          |       |            |          |ADALINE|  N/A   | 100 |73.9% |0.685|74.9%|0.652|                 |
95|         |          |       |Non-Linearly|          |-------+--------+-----+------+-----+-----+-----+-----+           |
96|         |          |       | Separable  |          |       |MLP     |  2  | 500  |94.0%|0.200|91.6%|0.217|           |
97|         |          |       |            |          |       |--------+-----+------+-----+-----+-----+-----+-----+     |
98|         |          |       |            |          |       |        |MLP  |  3   |5000 |99.7%|0.022|99.4%|0.025|     |
99|         |          |       |            |          |       |        |-----+------+-----+-----+-----+-----+-----+-----|
100|         |          |       |            |          |       |        |     |MLP   |  4  | 487 |99.7%|0.020|97.0%|0.061|
101+----------------------------------------------------------------------------------------------------------------------+
102
103   o  briefly discuss the results (1/2-1 page)
104
105   In all but one case (non-linearly separable ADALINE), the test set
106   accuracy was less than or equal to the training set accuracy. This
107   exception is most likely due to chance as the error is high anyway.
108   Otherwise this is as expected - the test set is unlikely to perform better
109   than the training set because it is unseen.
110
111   All the classifiers were easily able to train using the linearly separable
112   data - each reaching the MSE goal well before the maximum number of epochs
113   was reached. However, decreasing the MSE goal for ADALINE did not prove
114   effective. Although varying slightly with the random effects, the MSE
115   tended to reach a limit of around 0.20~0.25 by about 30 epochs for ADALINE
116   on linearly separable data. This may be due to oscillations [or / caused
117   by] the use of the `trainc' training function.
118
119   `trainc' is used for ADALINE training simply because `trainb' did not,
120   regardless of the learning rate. However, it did work if the training set
121   was reduced in size to 133 examples. In other words, if the number of
122   total examples was greater than or equal to 200 then `trainb' did not
123   converge. The epoch vs MSE graph could be described something like a
124   ski-jump, with the MSE increasing exponentially - in the order of 1020
125   after a few hundred epochs. This may be due to a bug in the toolkit, or a
126   memory limitation inherent in the matlab setup.
127
128   In all cases but the non-linearly separable MLPs with 2 and 4 hidden
129   neurons, the test set accuracy is not less than 1% worse than the training
130   set accuracy. This indicates a reasonable training model (ie it adapts
131   well to unseen data). The nl-MLP with 2 hidden neurons is possibly
132   affected by a combination of reaching the maximum number of epochs
133   (under-training) and a certain amount of under-fitting because there are
134   not enough free parameters to fit the transformed sin curve.
135
136   For the non-linearly separable MLP, after 200 epochs, the training-MSE for
137   2 hiddens was 0.27, 0.10 for 3 hiddens and 0.04 for 4 hiddens. This
138   indicates a training speed (of convergence) proportional to the number of
139   hidden neurons. However, the MLP with 4 hidden neurons performs poorly on
140   the test set. This is an indication of overfitting - there are too many
141   free parameters with 4 hidden neurons and the resulting classifier
142   over-estimates the true division. It is also worth observing that the MLP
143   with 4 hidden neurons is the only classifier that reached its training
144   goal on the non-linearly separable data (ie before the maximum number of
145   epochs was reached).
146
147   In most cases, the MSE reflects the resulting accuracy. Note that it is
148   possible for the MLP (and ADALINE) to have a non-zero MSE while achieving
149   100% accuracy (or different MSEs for the same accuracy). This is because
150   the output does not have to be +-1 in order for a successful
151   classification to be made (it is only necessary that the sign is the
152   same). Also, for each classifier, if one set had a higher accuracy, the
153   same set also had the lower MSE - they are approximately inversely
154   proportional.
155
156   The MLP networks are the only ones that are able to generate an accurate
157   classifier for the non-linearly separable data. This agrees with the
158   theory - Perceptron and ADALINE are not able to accurately separate
159   non-linearly separable examples due to inherent limitations. Both the
160   training set and the test set perform poorly on each classifier.
161
162   ADALINE was able to adapt more successfully to the unseen linearly
163   separable data than the perceptron. This agrees with the theory. In fact,
164   ADALINE achieves the same accuracy in this case for the test set and
165   training sets (although both are worse than the perceptron). Both this and
166   the relatively poor performance of ADALINE are possibly due to chance, or
167   the problems encountered with `trainb' (and how they were resolved).
Note: See TracBrowser for help on using the browser.