Neural Network Uncertainty Assessment Using Bayesian -Books Pdf

Neural network uncertainty assessment using Bayesian
15 Dec 2019 | 58 views | 0 downloads | 12 Pages | 449.13 KB

Share Pdf : Neural Network Uncertainty Assessment Using Bayesian

Download and Preview : Neural Network Uncertainty Assessment Using Bayesian

Report CopyRight/DMCA Form For : Neural Network Uncertainty Assessment Using Bayesian


D10304 AIRES ET AL NEURAL NETWORK UNCERTAINTIES 2 D10304. variety of probabilistic quantities such as the uncertainty where. estimates of the network outputs,G 5jfw w8 g gw 4, 6 These developments are used in order to provide a. new framework for the characterization and the analysis of. various sources of neural network errors In this work we is a W M matrix M is the number of outputs Introducing. separate the errors that are due to the NN weight uncer 4 into 2 and using Ey y gw8 x we obtain. tainty and the errors from all remaining sources We will Z. Ain GT 4w 124wT G Ain GT H 4w,P tjx D e 2Ey, comment on an approach to analyze in even more detail. the various contributions to output errors These errors are. described in terms of covariance matrices that can be. interpreted using eigen vectors called error patterns. Ain Ey 4w 124wT O 4w,see Rodgers 1990 eH dw 6, 7 These algorithmic developments are tested for the. retrieval of surface skin temperature microwave surface where. emissivities and integrated water vapor content from a. combined analysis of microwave and infrared observa T. h Ey T Ain G T, tions see Aires 2004 for a detailed description of this. application, 8 The theoretical computation of the predictive distri and O G Ain G T H.
bution of network outputs is developed in section 2 The. developments in section 2 are used in section 3 to charac The integral term in equation 6 can be simplified by. terize the NN output uncertainty sources The technique is. dimW 1 1 T, applied to a neural network inversion algorithm for remote 2p 2 jOj 2 e2h O h. sensing in section 4 Conclusions and perspectives are given. in section 5 We can rewrite equation 6 using this simplification to. 2 Predictive Distribution of Network Outputs 1,P tjx D e 2Ey e2Ey Ain GT G Ain GT H G Ain Ey. 9 The developments of this section aimed at describing. the distribution of the NN output i e predictive distribu 1. e 2Ey Ain Ain G T G Ain GT H G Ain Ey, tion and total output errors are inspired by the Bayesian 9. learning of neural networks chapter of Bishop 1996 It is. extended to the multivariate case which introduces matrix 11 This means that the distribution of t follows a. formulas instead of scalar ones Gaussian distribution with mean gw8 x and covariance. 2 1 Theoretical Derivation of the Network Output matrix. Error PDF h 1 i 1, 10 The distribution of uncertainties of the NN output y C 0 Ain Ain GT G Ain GT H G Ain 10. is given by, Z This covariance matrix can be simplified by multiplying.
P yjx D P yjx w P wjD dw 1 numerator and denominator by. G I H 1 G Ain G T G, where D is the set of outputs y in a data set B x t. n 1 N of N matched input output couples From Aires to obtain. 2004 equations 16 and 24 we find that this probability. is equal to,C 0 C in G T H 1 G 11,e 2 t gw x Ain t gw x. e 24w H 4w, dw 2 We see that the uncertainty in the network outputs are due to. 1 the intrinsic noise of the target data embodied in Cin and. 2 the uncertainty described by the posterior distribution of. where Ain is the inverse of Cin the covariance matrix of the. the weight vector w embodied in GT H 1 G This relation. intrinsic noise of physical variables y and H is the. describes the fact that the uncertainties are approximately. Hessian matrix of the quality criterion used by the learning. related to the inverse data density As expected uncertain. process see Aires 2004 for more details on these two. ties are larger in the less dense data space where the. matrices Note that all the terms not dependent on w like. learning algorithm gets less information,ED w in the work of Aires 2004 equation 24 have. been put together in the normalization factor Z A first order 2 2 Sources of Uncertainty. expansion of the neural network function gw about the 12 In his paper Rodgers 1990 separates the various. optimum weight w is now used sources of uncertainty into three components 1 random. error due to measurement noise 2 model error due to. gw x gw8 x G T 4w 3 uncertain model parameters and inverse model bias and. D10304 AIRES ET AL NEURAL NETWORK UNCERTAINTIES 2 D10304. 3 null space error due to the inherent finite resolution of 22 2 Or F Ab Cb ATb the covariance of the forward. the observing system and lack of information outside the model errors where Cb is the covariance matrix errors of the. range of the weighting functions We think that it is difficult forward model parameter b and Ab is the sensitivity matrix. to characterize the sources of errors using this classification of observations b with respect to b Rodgers 1990. because they interact together when the inversion method 23 Some bridges can be built to link our error analysis. uses a nonlinear model and the approach used in variational assimilation by Rodgers. 13 In our equation 11 we have separated the sources 1990 In the work of Aires et al 2004 such Jacobians are. of error in two terms the intrinsic noise with covariance analytically derived in the neural network framework This. matrix Cin and the neural inversion term with covariance makes feasible the use of Rodgers estimates The difference. matrix GTH 1G Our neural inversion term refers to the would be that our linearization uses Jacobians that are. errors due only to the uncertainty in the inverse model situation dependent this means that the estimation of the. parameters and all the remaining outside sources of errors error sources would be nonlinear in nature This will be the. are grouped in Cin subject of another study, 14 The inversion uncertainty can itself be decomposed 24 Another approach for the empirical characterization.
into three sources corresponding to the three main compo of the various sources of uncertainties is to use simulations. nents of a neural network model For example for the instrument noise related uncertainty it. 15 1 The imperfections of the learning data set B is easy to introduce a sample of noise into the network. which include simulation errors when B is simulated by inputs and analyze the consequent error distribution of. a radiative transfer model colocation and instrument the outputs The advantage of such simulation approach is. errors when B is a collection of in situ and satellite that it is very flexible and allows for the manipulation of. colocations null space errors etc This is probably the non Gaussian distributions This will be the subject of. most important source of uncertainty due to the inversion another study. 16 2 Limitations of the network architecture because. the model might not be optimum with too few number of 3 Error Characterization and Analysis. degrees of freedom or a structure that is not optimal This is 25 A neural network inversion scheme including first. usually a lower level source of uncertainty because the guess information has been developed to retrieve surface. network can partly compensate for these deficiencies temperature Ts water vapor column amount WV and. 17 3 A nonoptimum learning algorithm because as microwave surface emissivities at each frequency polariza. good as the optimization technique is it is impossible in tion Em over snow and ice free land from a combined. practice to be sure that the global minimum w has been analysis of microwave SSM I and infrared from Interna. found instead of a local one We think that this source of tional Satellite Cloud Climatology Project data Aires et al. uncertainty is limited 2001 Prigent et al 2003a See Prigent et al 2003b for. 18 Some of these sources of uncertainty can be the snow covered land case The present study aims in part. assessed for example by performing some Monte Carlo at providing uncertainty estimates for these retrievals Both. simulations cloudy and clear sky versions of this retrieval scheme have. 19 Cin includes all other sources of errors Our approach been developed but for simplicity only the clear sky case is. allows for the estimation of the global Cin but if some discussed here In this section the technical developments. individual terms are known it is possible to subtract them of section 2 are used to characterize uncertainty sources. from Cin For example if the instrument noise is known it. is possible to measure the impact of this noise on the NN 3 1 Distribution of Network Outputs. outputs The individual terms can then be subtracted from 26 After the learning stage we estimate C0 the covari. the global Cin For simplification and because we do not use ance matrix of network errors Ey t gw x over the. such a priori information we adopt the hypothesis that Cin database B Equation 11 shows that this covariance adds. is constant for each situation only the inversion term being the errors due to neural network uncertainties and all other. situation dependent Again any a priori information about sources of uncertainty Table 1 gives the numerical values of. any nonconstant term in Cin could be used in this very C0 for the particular example from Prigent et al 2003a. flexible approach The right top triangle is for the correlation and the left. 20 Note that the specification of the sources of uncer bottom triangle is for the covariance The diagonal values. tainty by the approach of Rodgers 1990 uses mainly the give the variance of errors of quantity The correlation part. concept of Jacobians of either the direct or the inverse indicates clearly that some errors are highly correlated This. model in order to linearize the impact of each error source is why it would be a mistake to monitor only the error bars. Linearity and Gaussian variables are easily manageable even if they are easier to understand. analytically the algebra being essentially based on the 27 The correlations of errors exhibit the expected. covariance matrices For example behavior Errors in Ts are negatively correlated with the. 21 1 CM Dx E DTx the covariance of the errors due other errors with large values of correlation with the. to instrument noise where Dx g w, x is the contribution vertical polarization emissivities for the channels that are. function and E hHT Hi is the covariance matrix of much less sensitive to the water vapor Em19V and Em37V. instrument noise H This additional term is actually the The vertical polarization emissivities are larger than for the. multivariate equivalent of the expression found in the work horizontal polarizations and are often close to one with the. of Wright et al 2000 where the noise model is explicitly consequence that the radiative transfer equation in channels. introduced in the Bayesian framework that are much less sensitive to the water vapor the 19 and. D10304 AIRES ET AL NEURAL NETWORK UNCERTAINTIES 2 D10304. Table 1 Covariance Matrix C0 of Network Output Error Estimated Over the Database Ba. Ts WV Em19V Em19H Em22V Em37V Em37H Em85V Em85H, Ts 2 138910 0 24 0 87 0 72 0 76 0 84 0 72 0 49 0 32. WV 1 392113 14 708836 0 16 0 06 0 14 0 05 0 15 0 18 0 37. Em19V 0 006294 0 003179 0 000024 0 77 0 88 0 89 0 74 0 60 0 42. Em19H 0 005261 0 001143 0 000019 0 000024 0 72 0 73 0 81 0 60 0 56. Em22V 0 006274 0 003140 0 000024 0 000020 0 000031 0 84 0 71 0 71 0 54. Em37V 0 006121 0 001049 0 000021 0 000018 0 000023 0 000024 0 81 0 70 0 50. Em37H 0 005290 0 002954 0 000018 0 000020 0 000020 0 000020 0 000025 0 65 0 67. Em85V 0 004895 0 004945 0 000020 0 000020 0 000027 0 000023 0 000022 0 000046 0 79. Em85H 0 003906 0 011933 0 000017 0 000022 0 000024 0 000020 0 000027 0 000044 0 000067. The right top triangle is for correlation and left bottom triangle is for covariance the diagonal gives the variance Correlations with absolute value higher. than 0 3 are in bold, 37 GHz channels the radiative transfer equation is quasi the PCA provides an estimate of the number of degrees. linear in Ts and in EmV In contrast errors in water vapor are of freedom in the retrieval error structure Components 1. weakly correlated with the other errors the largest correla and 2 explain 55 and 30 percent of the error respectively. tion is with the emissivity at 85 GHz in the horizontal This means that the errors are concentrated in the first two. polarization The 85 GHz channel is the most sensitive to error patterns Figure 1b shows the first four PCA. water vapor and since the emissivity for the horizontal components in the output variable space The first com. polarization is lower than for the vertical the horizontal ponent is essentially related to Ts and the emissivities in. polarization channel is more sensitive to water vapor Cor vertical polarization with a weight on water vapor close to. relations between the water vapor and the emissivities errors zero negative value for Ts and positive ones for the Em. are positive or negative depending on the respective con especially for the EmV are consistent with the correlations. tribution of the emitted and reflected energy at the surface of errors found in Table 1 that indicate that Ts and EmV. which is related not only to the surface emissivity but also to errors are anticorrelated Water vapor dominates the sec. the atmospheric contribution at each frequency Correla ond PCA component along with the emissivities for. tions between emissivity errors are always of the same sign channels that are more sensitive to water vapor namely. and are high for the same polarizations decreasing when the 22 GHz and for 85 GHz horizontal polarization Maps of. difference in frequency increases the first component of the PCA for two months not. 28 The correlations involved in the PDF of the errors shown do not show any well defined spatial structures. described by the covariance matrix C0 make it necessary to that are related to surface characteristics which is a good. understand the uncertainty in a multidimentional space This result The PCA second component maps not shown are. is more challenging than just determining the individual somewhat related to the water vapor fields with positive. error bars but it is also much more informative the value of the component in areas of large WV and negative. diagonal elements of the covariance matrix provide the ones in dry air regions That suggests that the inversion. variance for each output error but the off diagonal terms tends to underestimate WV in humid regions and overes. show the level of dependence among these output errors To timate it in dry ones which might be related to the use of. statistically analyze the covariance matrix C0 we decom absolute values of the humidity in the retrieval instead of. pose it into its orthogonal eigen vectors This base set relative humidity values that would give more weight to. constitutes a set of error patterns Rodgers 1990 so that low WV amounts An over representation of dry situations. the contribution of each of these patterns to the total error is in learning data set can also be an explanation for the. decorrelated In practice the eigen vectors in columns of L underestimation of WV in wet situations. are the error patterns lk given by,3 2 Covariance of Output Errors Due to the. C 0 l k sk l k 12 Neural Inversion, The eigen vectors lk need to be multiplied by sk2 so that each.
1 30 We already saw in the work of Aires 2004 that the. output component in y has the same statistical weight in the matrix H 1 is the covariance of the PDF of network. definition of the error patterns i e this is the normalized weights The use of the gradient G transforms this matrix. PCA The error in the network outputs is the sum of the into GTH 1G the covariance error of the NN outputs. individual errors associated with the uncertainty of weights Aires 2004. Note that multiplication by G regularizes H 1 so that for. SL this particular purpose of the estimation of the output errors. Ey ak l k 13 H does not need to be regularized as described in the work. k 1 of Aires 2004, 31 Table 2 represents this covariance matrix GTH 1G. where the factors ak follow a Gaussian random distribution averaged over the whole learning database B Even if some. with unit variance The interpretation of the different error of the bottom left values representing the covariance matrix. patterns can provide a useful insight into the origin of the are close to zero structure is still present in this matrix as is. errors shown in the correlation part top right This is an artifact. 29 Figure 1a presents the percentage of variance since the variability ranges of the variables are quite. explained by the cumulated eigen vectors In this way different from each other The error correlation matrix. D10304 AIRES ET AL NEURAL NETWORK UNCERTAINTIES 2 D10304. COMPONENT 1,COMPONENT 2,COMPONENT 3,0 4 COMPONENT 4. PERCENTAGE OF VARIANCE EXPLAINED,PRINCIPAL COMPONENT. NETWORK OUTPUT ERRORS,NEURAL NETWORK UNCERTAINTY,50 INTRINSIC ERRORS. 1 2 3 4 5 6 7 8 9 Ts WV Em 19V Em 19H Em 22V Em 37V Em 37H Em 85V Em 85H. CUMULATED NUMBER OF EIGENVALUES NEURAL NETWORK OUTPUT. COMPONENT 1 COMPONENT 1,COMPONENT 2 COMPONENT 2,COMPONENT 3 COMPONENT 3.
0 4 COMPONENT 4 0 4 COMPONENT 4,PRINCIPAL COMPONENT. PRINCIPAL COMPONENT, Ts WV Em 19V Em 19H Em 22V Em 37V Em 37H Em 85V Em 85H Ts WV Em 19V Em 19H Em 22V Em 37V Em 37H Em 85V Em 85H. NEURAL NETWORK OUTPUT NEURAL NETWORK OUTPUT, Figure 1 Eigen decomposition of covariance matrices a explained variance b error patterns for C0. network output errors c error patterns for GTH 1G errors due to neural network uncertainty and. d error patterns for Cin intrinsic errors, GTH 1G related to the NN inversion method has relatively the errors Figures 1a and 1c shows the explained cumulated. small magnitudes with a maximum of 0 55 However it has variance spectrum and the corresponding error patterns The. structure similar to the global correlation matrix with the overall behavior of the first components is rather similar to. same signs of correlation and similar relative values be the analysis of matrix C0 of section 3 1 The first component. tween the variables is related to Ts and the emissivities in vertical polarization. 32 As in section 3 1 we use an eigen decomposition of negative value for Ts and positive ones for the Em. GTH 1G to find the error patterns involved in this part of especially for the EmV Water vapor dominates the second. Table 2 Covariance Matrix GTH 1G of Error Due to Network Uncertainty Averaged Over the Database Ba. Ts WV Em19V Em19H Em22V Em37V Em37H Em85V Em85H, Ts 0 493615 0 14 0 28 0 14 0 25 0 32 0 16 0 19 0 06.
WV 0 106484 1 063071 0 10 0 02 0 09 0 02 0 07 0 15 0 25. Em19V 0 000325 0 000167 0 000002 0 33 0 55 0 55 0 28 0 27 0 08. Em19H 0 000255 0 000060 0 000001 0 000006 0 26 0 22 0 29 0 10 0 13. Em22V 0 000268 0 000152 0 000001 0 000001 0 000002 0 50 0 26 0 28 0 12. Em37V 0 000330 0 000033 0 000001 0 000000 0 000001 0 000002 0 34 0 38 0 14. Em37H 0 000270 0 000183 0 000001 0 000001 0 000000 0 000001 0 000005 0 16 0 26. Em85V 0 000231 0 000282 0 000000 0 000000 0 000000 0 000000 0 000000 0 000002 0 43. Em85H 0 000128 0 000681 0 000000 0 000000 0 000000 0 000000 0 000001 0 000001 0 000006. The right top triangle is for correlation and left bottom triangle is for covariance the diagonal gives the variance Correlations with absolute value higher. than 0 3 are in bold, D10304 AIRES ET AL NEURAL NETWORK UNCERTAINTIES 2 D10304. Table 3 Covariance Matrix Cin of Intrinsic Noise Errors Estimated Over the Database Ba. Ts WV Em19V Em19H Em22V Em37V Em37H Em85V Em85H, Ts 1 645294 0 27 0 99 0 92 0 86 0 95 0 88 0 55 0 37. WV 1 285629 13 645765 0 17 0 06 0 14 0 05 0 16 0 19 0 39. Em19V 0 005968 0 003011 0 000021 0 89 0 91 0 92 0 83 0 63 0 46. Em19H 0 005006 0 001083 0 000017 0 000017 0 83 0 86 0 98 0 71 0 66. Em22V 0 006005 0 002988 0 000023 0 000019 0 000029 0 87 0 80 0 75 0 58. Em37V 0 005790 0 001015 0 000020 0 000017 0 000022 0 000022 0 90 0 72 0 54. Em37H 0 005019 0 002770 0 000017 0 000018 0 000019 0 000019 0 000019 0 74 0 76. Em85V 0 004663 0 004662 0 000019 0 000019 0 000026 0 000022 0 000021 0 000043 0 82. Em85H 0 003777 0 011251 0 000016 0 000021 0 000024 0 000019 0 000026 0 000042 0 000060. The right top triangle is for correlation and left bottom triangle is for covariance the diagonal gives the variance Correlations with absolute value higher. than 0 3 are in bold, PCA component along with the emissivities for channels iteratively alternate the learning process and the hyper. that are more sensitive to water vapor namely 22 GHz and parameter estimation until the hyperparameters stabilize It. for 85 GHz horizontal polarization would be interesting to monitor the evolution of both of. these matrices, 3 3 Covariance of the Intrinsic Noise of Target Values. 33 To estimate Cin we use equation 11,4 Uncertainty of Network Outputs.
4 1 Network Outputs Error Estimate,C in hC 0 iB GT H 1 G B. 38 Once Cin is available we can estimate a C0 x that is. dependent on the observations x the term GTH 1G varying. where the two right hand terms are the covariance matrix of with input x It should be noted that the use of the. the total output errors averaged over B section 3 1 and the regularization for matrix H presented in the work of Aires. covariance matrix of the output errors due to the network 2004 has virtually no consequences for the results. inversion scheme averaged over B Aires 2004 Table 3 obtained for the error bars in the following Using no. gives the numerical values of the matrix Cin The right top regularization for the Hessian matrix is possible since H. triangle is for the correlation and the left bottom triangle is is multiplied by the gradients in GTH 1G This is an. for the covariance additional argument that the regularization helps the matrix. 34 Intrinsic error correlations can be very large up to inversion without damaging the information in the Hessian. 0 99 The structure of Cin is also very similar to the 39 C0 x is estimated for each of the 1 239 187 samples. structure of the global error correlation matrix the only for clear sky pixels in July 1992 Figure 2 presents the. noticeable difference being the larger correlation values monthly mean standard deviations square root of the. 35 The eigen decomposition shows that most of the diagonal terms in C0 x for four outputs the surface skin. error variability is related to the first pattern The first temperature Ts the columns integrated water vapor WV and. component explains 90 of the errors meaning that the the microwave emissivities at 19 GHz for vertical and. number of degrees of freedom in the retrieval error vari horizontal polarizations. ability is limited It is mostly related to Ts and to the 40 The errors exhibit the expected geographical pat. emissivities with very similar weights As for the other terns Large errors on Ts are concentrated in regions where. matrices maps of the PCA components do not have very the emissivities are lower and or highly variable inundated. particular spatial structures and are rather similar to the areas and deserts In inundated areas for instance around. other maps the rivers like the Amazon or the Mississippi or in coastal. regions the contribution from the surface is weaker and. 3 4 Hyperparameters Optimization sensitivity to Ts is lower because the emissivities are lower. 36 We saw in the work of Aires 2004 that the hyper In sandy regions through desert areas due to higher. parameter matrices Ain and Ar can be used a priori in the transmission in the very dry sandy medium microwave. quality criterion for the training of the NN This would have radiation does not come from the very first millimeters of. a regularization effect on the network How are these the surface but from deeper below the surface the lower the. hyperparameters to be obtained a priori In a Bayesian frequency the deeper Prigent and Rossow 1999 As a. framework one type of estimation procedure is the consequence the microwave radiation is not directly related. so called evidence approximation scheme Gull 1988 to the skin surface temperature see Prigent and Rossow. MacKay 1992 based on the conventional statistics type II 1999 for a detailed explanation and Ts cannot be retrieved. maximum likelihood Berger 1985 with the same accuracy The same arguments hold for the. 37 Another simpler approach could be to omit the errors in emissivity All the parameters being tightly related. hyperparameters in the first stage by using the simplified for a given pixel the water vapor errors are also rather large. data and regularization criteria of Aires 2004 equations 6 in inundated regions and in sandy areas. and 9 This is the method adopted in our study After the. learning process the hyperparameters Ain and Ar can then 4 2 Marginalization of the Error Probability. be directly estimated and used in a new and more 41 The marginalization of the total error PDF consists in. constrained quality criterion to re train the NN We can conditioning part of it by integrating over some of the error. AIRES ET AL NEURAL NETWORK UNCERTAINTIES 2, Figure 2 Standard deviation of error maps for a surface skin temperature Ts b columns integrated water vapor WV. c microwave emissivity at 19 GHz vertical polarization and d microwave emissivity at 19 GHz horizontal polarization. D10304 AIRES ET AL NEURAL NETWORK UNCERTAINTIES 2 D10304. Figure 3 Two dimensional marginalization of the network output error PDF for surface skin. temperature Ts integrated water vapor WV and 19 GHz emissivity for horizontal polarization Em19H. Top two graphs are for deserts and bottom two are for tropical forests Contour lines represent the equal. probability ellipsoids with levels of 80 60 40 and 20 of the maximum of the PDF from the center to. the outside, variables and analyzing only the remaining few Condi retrieval for all situations as long as the error estimate is. tioning of the error probability is a good compromise specified. between analyzing all the variables at the same time with. the eigen value decomposition but each mode is only part 4 3 Outlier Detection. of the variability and observing the error bars for only 44 What is the behavior of the neural retrieval when. one variable In this section the total PDF of output the situation is particularly difficult like when the first. errors is projected onto only two variables to obtain a guess is far from the actual solution In principle the. two dimensional PDF of errors This allows us to quantify nonlinearity of the neural network allows it to have. the spread of errors in the same way as an histogram for a different weights on the observations and first guess. one dimensional measurement It also gives a measure of information depending on the situation For example if. the correlation of errors between the two considered the first guesses are better in tropical cases than in polar. variables cases the neural network will have inferred this behavior. 42 In Figure 3 such a two dimensional marginalization during the learning stage and then will give less emphasis. of the error PDF is presented for Ts paired with WV or Em to the first guess when a polar situation is to be inverted. 19 GHz Because the total PDF of errors is Gaussian the This assumes once again that the training data set is. contour plots of the marginalized error probability are equal correctly sampled To understand the behavior of the. probability ellipsoids There is no bias and as a conse uncertainty estimates better a good strategy is to introduce. quence ellipsoids are centered on zero In this figure the artificial errors for each source of information and to. data samples are separated for deserts and tropical forests analyze the resulting impact on the network outputs The. First for both surfaces Ts errors are anticorrelated with goal of this section is to validate our uncertainty estimate. emissivities as already discussed The probability ellipsoids by analyzing extreme case we don t investigate here. for Ts versus WV are almost symmetric around 0 in Ts errors physical error structures. meaning that errors in Ts and WV are poorly correlated 45 In Figure 4 the retrieval STD error change index is. Second the errors are more important for desert surfaces presented to show the effect of perturbating the mean inputs. than for tropical forest confirming the results obtained in or the mean FGs by an artificial error The impact of these. Figure 2 artificial errors is measured in term of percentage of the. 43 It should be noted that the error estimates would regular STD retrieval error as estimated in section 4 1 For. be considerably improved if outliers were excluded such example an impact index of 120 means that the regular. as coast contaminated or wetland pixels However rather STD retrieval error estimate increases by 20 when the. than filtering difficult retrievals we prefer to perform the input is perturbed The impact indices can be compared for. D10304 AIRES ET AL NEURAL NETWORK UNCERTAINTIES 2 D10304. Figure 4 Estimated STD error change index for an artificial perturbation a of the mean input b of. the mean first guess input c of individual first guess negative changes and d of individual first. guess positive changes see detailed explanation in the text Statistics are performed over 20 000 samples. each of the nine network outputs These results are obtained with larger impact for positive perturbations but we ob. by averaging over the 20 000 samples in B serve also that errors are larger than when all the inputs are. 46 Figure 4a presents the error impacts when all perturbed in Figure 4a This suggests that the error estimate. 17 network inputs are changed by a factor ranging from is able to detect inconsistencies between observations and. 5 to 5 Obviously this will introduce incoherent first guess inputs. situations since the complex nonlinear relationships between 48 In Figures 4c and 4d the first guess input variables. vertical horizontal brightness temperatures and first guesses are perturbed individually with respectively negative and. will not be respected As expected the error increases positive amplitude of 5 For negative perturbations the. monotically with the absolute value of the perturbation biggest impact is produced by the Ts first guess perturba. However the impact is not uniform among the output tion it is noticeable that the Ts error impact is similar for the. variables For WV which is retrieved with a rather low retrieval of Em19H and for its own retrieval For other. accuracy changes in the inputs do not have a large influence variables the impacts have lower levels with almost no. The impact on the emissivities is larger for horizontal impact from the WV first guess The WV first guess is. polarizations than for vertical horizontal polarization emis associated with large error 40 and as a consequence the. sivities are much more variable than the vertical ones and as NN gives little importance to this first guess For positive. a consequence emissivities for vertical polarization have individual perturbations in Figure 4d the results are similar. rather similar values in outputs whatever the situation and do to the negative errors The magnitude of the positive. not depend that much on the inputs It can also be noted that changes as compared to the negative ones are related again. positive perturbations have a slightly stronger impact than to the distribution of the variables in the training data set. negative ones This is to be related to the distribution of the see Aires et al 2001 Figure 3 If the distribution is not. variables in the training data base For the emissivities for symmetric around a mode value depending on the shape of. instance the distribution has a steep cut off for unit emis the distribution increasing or decreasing the value can be. sivity above which the emissivities are not physical On the more or less realistic. contrary a large range of emissivities exists in the training 49 In Figure 5 incoherencies have been introduced. data base at lower values see Aires et al 2001 Figure 3 between the vertical and horizontal polarizations in the. As a consequence decreasing the emissivity first guess will brightness temperatures TB observations and in the first. still be physically realistic whereas increasing it will not be guess emissivities Ems by increasing or decreasing one. 47 Figure 4b is the same except that the changes are keeping the other polarization constant In Figure 5a we. made only for the first guess inputs We note a similar increased and decreased artificially by 5 the horizontal TB. behavior nonuniform impact among output variables and and in Figure 5b the same has been done for vertical. D10304 AIRES ET AL NEURAL NETWORK UNCERTAINTIES 2 D10304. Figure 5 Estimated STD error change index for an artificial perturbation a of horizontal polarization. brightness temperatures b of vertical polarization brightness temperatures c of horizontal polarization. first guess emissivities and d of vertical polarization first guess emissivities Statistics are performed. over 20 000 samples from B, polarizations Figures 5c and 5d are similar for first guess ments situations not included in the training data set or. emissivities instead of TB Several comments can be made uncertainties of the neural network on the possible retriev. First the impact is larger for observations than for first als Our a posteriori probability distributions for the neural. guess errors which suggests that observations are more network retrieval define confidence intervals on the re. important for the retrieval the first guess being used mostly trieved quantities that allow the detection of such situations. as an additional constraint Second these polarization 51 Since outlier detection can concerns individual per. inconsistencies have a bigger impact than changes of the tubations in one of the measurements another experience. means in Figure 4 For example the NN might emphasize was done In Figure 6 the retrieval STD error change index. the difference of polarization for the retrieval and then these is presented when each of the inputs are individually. inconsistencies would have a very strong impact This changed to an extreme value The FG surface skin temper. shows that the NN using complex nonlinear multivariate ature is set to 250 and 350 K The error estimate increases. relationships is sensitive to inconsistencies among the respectively by about 25 and 90 This unusual error. inputs It is encouraging to see that our error estimates are estimates should allow the detection of such individual. able to detect such situations Lastly the relative impact of outliers The same behavior is observed for brightness. the positive and negative changes can be explained again by temperature measurements or for microwave emissivity. the distribution of the variables in the learning data base first guesses. For the emissivities whatever the polarization and the 52 It could be argued that a limitation of our retrieval. frequency the histograms are not symmetric having a broad uncertainty estimates comes from the fact that our technique. tail toward lower values and an abrupt end for the higher is based on statistics over a data set B This could mean that. values as a consequence when artificially increasing the the error estimate is only valid when we are inside the. emissivities unrealistic values are attained which is not the variability spanned by B On the contrary it has been shown. case when decreasing the emissivities See Aires et al that the local quadratic approximation approach increases. 2001 for a complete description of the distributions of the accuracy of error estimates in sparsely sampled data. the learning data base and the histograms of the inputs space domains see e g MacKay 1992. 50 The results shown in Figures 4 and 5 are consistent. with a coherent physical behavior confirming that the new. tools developed in this study and its companion papers can. 5 Conclusion and Perspectives, be used to diagnose difficult retrieval situations such as 53 This paper describes a technique to estimate the.
might be caused by bad first guesses inconsistent measure uncertainties of neural network retrievals and provides a. D10304 AIRES ET AL NEURAL NETWORK UNCERTAINTIES 2 D10304. Figure 6 Estimated STD error change index for an individual perturbation of the network inputs. rigorous description of the sources of uncertainty The tools 56 Many new algorithmic developments can be pursued. are very generic and can be used for different linear or and we provided a few ideas For example the network. nonlinear regression models A fully multivariate formula output uncertainties can easily be used for a novelty. tion is introduced Its generality will allow future develop detection i e data that has not been used to train the. ments like the iterative re estimation strategy or the fully network or fault detection i e data that are corrupted by. Bayesian estimation of the hyperparameters It gives errors like instrument related problems Our determination. insights into the neural technique that is often considered of error characteristics can also be used with adaptative. with suspicion because its mechanisms are rarely or clearly learning algorithms i e learning when a small additional. explicited data set is provided after the main learning of the network. 54 Together with the introduction of first guess infor has been done. mation first described in the work of Aires et al 2001 57 We mentioned that the NN Jacobians Aires et al. error specification makes the neural network approach even 2004 1999 can be used to express the various sources of. closer to more traditional inversion techniques like varia uncertainty with even more detail using Rodgers 1990. tional assimilation Ide et al 1997 and iterative methods in approach Another technical development would be the. general Furthermore quantities obtained from NN retriev optimization of the hyperparameters as described in. als can now be combined with forecast model in a varia section 3 4 using an iterative re estimation strategy or. tional assimilation scheme since the error covariances evidence measure in a Bayesian framework Near 1996. matrices can be estimated These covariance matrices are Nabney 2002. not constant they are situation dependent This makes the 58 Applications of these new tools and concepts are. scheme even better since it is possible now to assimilate numerous This approach can first be used for the inversion. only inversions of good quality low uncertainty estimates of satellite observations from temperature humidity sound. Bad situations can be discarded from the assimilation ing instruments The technique described in the work of Aires. or even better can be used as an extreme detection et al 2002a will be used to assess the quality and the. scheme that would for example signal the need for an difficulties in the retrieval of atmospheric profiles such as. increased number of simulations in an ensemble forecast temperature water vapor or ozone It would be very inter. All these new developments establish the neural network esting to quantify the uncertainties for each atmospheric. technique as a serious candidate for remote sensing in layer In that sense this will give an overview of the actual. operational schemes compared to the more classical vertical resolution that can be expected with the next gener. approaches Twomey 1977 ation instruments like IASI Infrared Atmospheric Sounding. 55 Our method provides a framework for the character Interferometer or AIRS Atmospheric Infrared Sounder. ization the analysis and the interpretation of the various 59 We would like to test how beneficial these uncertainty. sources of uncertainty in any neural network based retrieval estimates would be when inverted satellite measurements are. scheme This makes possible improvements in the inversion assimilated instead of raw brightness temperatures Another. schemes Any fault that can be detected can be corrected application concerns the analysis of climate systems Aires. Lack of data in the observation domains errors of the model and Rossow 2003 In this modeling of dynamical systems. in some specific situations or detection of extreme events the prediction uncertainty might be used to detect complex. This should benefit a large community of neural network situations where the attractor can diverge toward various. users in meteorology climatology basins of attraction. D10304 AIRES ET AL NEURAL NETWORK UNCERTAINTIES 2 D10304. Notation vapor cloud liquid water path surface temperature and emissivities over. land from satellite microwave observations J Geophys Res 106 D14. y vector of physical variables to retrieve outputs of 14 887 14 907. the NN Aires F W B Rossow N A Scott and A Chedin 2002a Remote. sensing from the infrared atmospheric sounding interferometer instru. M dimension of y number of outputs in the NN ment 2 Simultaneous retrieval of temperature water vapor and ozone. t target vector of physical variables in data set B atmospheric profiles J Geophys Res 107 D22 4620 doi 10 1029. x observations vector inputs of the NN 2001JD001591. Aires F A Che din N A Scott and W B Rossow 2002b A regularized. H SSM I instrumental noise noise on inputs x of the neural network approach for retrieval of atmospheric and surface tem. NN peratures with the IASI instrument J Appl Meteorol 41 144 159. Ev generic error symbol for variable v Aires F C Prigent and W B Rossow 2004 Neural network uncertainty. assessment using Bayesian statistics with application to remote sensing. Pv generic probability measure for variable v 3 Network Jacobians J Geophys Res 109 D10305 doi 10 1029. C0 A 1 0 covariance matrix of total error on 2003JD004175. retrieved physical variables y Bates D M and D G Watts 1988 Nonlinear Regression Analysis and. Cin A 1 in covariance matrix of intrinsic noise on. Its Applications John Wiley Hoboken N J, Berger J O 1985 Statistical Decision Theory and Bayesian Analysis. physical variables y equivalent to 1 b in traditional Springer Verlag New York. Bayesian formulation Bishop C 1996 Neural Networks for Pattern Recognition 482 pp. Cr A 1 r covariance matrix for weight regulariza,Clarendon Press Oxford UK. Gull S F 1988 Bayesian inductive inference and maximum entropy in. tion equivalent to 1 a in traditional Bayesian Maximum Entropy and Bayesian Methods in Science and Engineering. formulation vol 1 Foundations edited by G J Erickson and C R Smith pp 53. H rjw rjw ED w the Hessian matrix of the 74 Kluwer Acad Norwell Mass. Ide K P Courtier M Ghil and A C Lorenc 1997 Unified notation for. log likelihood data assimilation Operational sequential and variational J Meteorol. G rjfw w g gw,8 Soc J 75 181 189, CM the covariance of the errors due to instrument noise Kalnay E 2002 Atmospheric Modeling Data Assimilation and Predict. E hET Ei covariance matrix of the measurement ability 364 pp Cambridge Univ Press New York. Koroliouk V N Portenko A Skorokhod and A Tourbine 1983 Aide. errors Me moire de The orie des Probabilite s et de Statistique Mathe matique. F covariance matrix of the radiative transfer model 581 pp Edition Mir Moscow. errors Le Cun Y J S Denker and S A Solla 1990 Optimal brain damage in. Advances in Neural Information Processing Systems vol 2 edited by. Cb the covariance matrix of the forward model D S Touretzky pp 598 605 Morgan Kaufmann Burlington Mass. parameter errors MacKay D J C 1992 A practical Bayesian framework for back propa. x is the contribution function,gation networks Neural Comput 4 3 448 472. x y Nabney I T 2002 Netlab Algorithms for Pattern Recognition Springer. Ab b is the sensitivity of observations x with Verlag New York. respect to b the parameters of the radiative transfer Near R M 1996 Bayesian Learning for Neural Networks Springer. model Verlag New York, Prigent C and W B Rossow 1999 Retrieval of surface and atmospheric.
L matrix whose columns are the error patterns lk parameters over land from SSM I Potential and limitations Q J R. T transposition operator Meteorol Soc 125 2379 2400. h iB expectation operator Prigent C F Aires and W B Rossow 2003a Land surface skin tem. peratures from a combined analysis of microwave and infrared satellite. gw neural network model or transfer function for our observations for an all weather evaluation of the differences between air. application and skin temperatures J Geophys Res 108 D10 4310 doi 10 1029. w wi i 1 W the vector of the network 2002JD002301, Prigent C F Aires and W B Rossow 2003b Retrieval of surface and. weights atmospheric geophysical variables over snow and ice from satellite. W dimension of w microwave observations J Appl Meteorol 42 368 380. B learning database that includes outputs D Rivals I and L Personnaz 2000 Construction of confidence intervals for. D target or network output database neural networks based on least squares estimation Neural Network 13. N number of samples in D and B Rivals I and L Personnaz 2003 MLPs mono layer polynomials and. ED w data term of the quality criterion multi layer perceptrons for nonlinear modeling J Machine Learning. Res 3 1383 1398, Rodgers C D 1990 Characterizatioon and error analysis of profiles. 60 Acknowledgments We would like to thank Ian T Nabney for retrieved from remote sounding measurements J Geophys Res 95. providing the Netlab toolbox from which some of the routines have been 5587 5595. used in this work Filipe Aires would like to thank Andrew Gelman for very Saltieri A K Chan and E M Scott 2000 Sensitivity Analysis John. interesting discussion about modern Bayesian statistics This work was Wiley Hoboken N J. partly supported by NASA Radiation Sciences and Hydrology Programs Twomey S 1977 Introduction to the Mathematics of Inversion in Remote. Sensing and Indirect Measurements Elsevier Sci New York. References Wright W A G Ramage D Cornford and I T Nabney 2000 Neural. network modelling with input uncertainty Theory and application. Aires F 2004 Neural network uncertainty assessment using Bayesian J VLSI Signal Process 26 169 188. statistics with application to remote sensing 1 Network weights. J Geophys Res 109 D10303 doi 10 1029 2003JD004173, Aires F and W B Rossow 2003 Inferring instantaneous multivariate. and nonlinear sensitivities for the analysis of feedback processes in a. dynamical system The Lorenz model case study Q J R Meteorol. Soc 129 239 275 F Aires Department of Applied Physics and Applied Mathematics. Aires F M Schmitt N A Scott and A Che din 1999 The weight Columbia University NASA Goddard Institute for Space Studies 2880. smoothing regularisation for MLP for resolving the input contribution s Broadway New York NY 10025 USA faires giss nasa gov. errors in functional interpolations IEEE Trans Neural Networks 10 C Prigent CNRS LERMA Observatoire de Paris 61 av de. 1502 1510 l Observatoire Paris F 75014 France catherine prigent obspm fr. Aires F C Prigent W B Rossow and M Rothstein 2001 A new neural W B Rossow NASA Goddard Institute for Space Studies 2880. network approach including first guess for retrieval of atmospheric water Broadway New York NY 10025 USA wrossow giss nasa gov.

Related Books

Scandinavian Life Science Funding Report 2017

Scandinavian Life Science Funding Report 2017

The Scandinavian Life Science Funding Report 2017 covers the investment landscape and financing activities in the life science sector in Scandinavia, mapping investment activities in around 130 companies throughout 2017. The report includes private fundraising from Scandinavian specialist life science venture capital investors as well as public capital raising activities in connection with ...

2 Grade English - A Grade Ahead

2 Grade English A Grade Ahead

each consonant blend and graph makes. Teaching Tip: Students learned consonants in previous grades. Before eaching consonant blends and digraphs, review the definition of a consonant. and then blend them together to practice making the consonant blend. These word boxes define terms used within A consonant is a letter in the a vowel.

Electrical Distribution Services Electrical Distribution ...

Electrical Distribution Services Electrical Distribution

3 Electrical Distribution Maintenance Services guide 2015 Contents Contents 1. History 4 2. Electrical Distribution Maintenance fundamentals 6 2.1 Why maintain ED equipment? 8 2.2 Reactive maintenance 11 2.3 Preventive maintenance 11 2.4 On site condition maintenance 12 2.5 Predictive maintenance 13 2.6 Frequency of Intervention 13 2.7 Spare Parts 15 3.

Analyst and Investor presentation - Corporativo MAPFRE

Analyst and Investor presentation Corporativo MAPFRE

There is not enough claims development to assess the final potential impact of the new Baremo. MAPFRE has applied the best and most prudent estimates to claims valuations and to IBNR reserves. Actions taken - prior to January 1st 2016 Actions taken - 2016 Actions taken - 2017 . 14 01 Key Highlights > 3M 2017 01 KEY HIGHLIGHTS 02 FINANCIAL OVERVIEW 03 BACKUP . 15 02 Financial Overview Million ..

United States & Canadian Academy of Pathology Audio Visual ...

United States amp Canadian Academy of Pathology Audio Visual

October 30, 2018 RFP deadline. Proposals will not be accepted after this date. November 2018 Deliberation January 2019 Interview finalists in Palm Springs, CA February 2019 Award Contract(s) About USCAP The United States and Canadian Academy of Pathology has been the most prestigious provider of continuing medical education (CME) for pathologists globally for more than a century, superbly ...

Ipswich Grammar School Curriculum Handbook

Ipswich Grammar School Curriculum Handbook

Ipswich Grammar School Curriculum Handbook . Ipswich Grammar School Subject Selection Year 11, 2018 Page | 1 Dear Parents and Students The transition from Year 10 to Year 11 is sometimes a big step for some students and it is of the utmost importance that students consult with the Deputy Headmaster, Dean of Teaching & Learning, Head of Year, Heads of Departments and Career Advisor with their .



mampu menjelaskan dan memberi contoh: 1. pengertian vektor berdasarkan ruang lingkupnya. 2. operasi vektor didalam ruang dimensi dua dan tiga. 3. menyelesaikan soal vektor yang berkaitan dalam bidang keahlian. C. Ruang Lingkup Bahan ajar vektor dimaksudkan untuk meningkatkan kompetensi guru

Using Crystals & Stones for Fertility treatment

Using Crystals amp Stones for Fertility treatment

Using Crystals & Stones for Fertility Treatment 2 CE Hours In this class, you will learn the advanced healing techniques designed to treat fertility issues. Topics covered will include crystal grids and layouts for enhancing fertility (including the Moonstone Layout), techniques for Karmic Release, methods for utilizing crystals for emotional release, a Rose Quartz healing visualization, a ...

2005 Management Catalogue - Wiley-Blackwell

2005 Management Catalogue Wiley Blackwell

Management (2003). His new book Organization at the Limit will publish in 2005. We are delighted to announce the publication of the second edition of the Blackwell Encyclopedia of Management: the definitive management reference source for students, researchers, academics and practitioners in the field of business and management. Bringing together



The Blackwell encyclopedic dictionary of human resource management. Cambridge, MA US: Blackwell Business/Blackwell Publishers. Pfeffer, J. (1994). Competitive Advantage Through People: Unleashing the Power of the Workforce. Boston: Harvard Business School Press Piansoongnern, O., & Anurit, P. (2010). Talent Management: Quantitative and Qualitative