Shapley Regressions A Framework For Statistical Inference -Books Pdf

Shapley regressions A framework for statistical inference
18 Feb 2020 | 31 views | 0 downloads | 49 Pages | 3.00 MB

Share Pdf : Shapley Regressions A Framework For Statistical Inference

Download and Preview : Shapley Regressions A Framework For Statistical Inference


Report CopyRight/DMCA Form For : Shapley Regressions A Framework For Statistical Inference



Transcription

Shapley regressions,A framework for statistical inference. on machine learning models,Andreas Joseph,Bank of England and King s College London. March 11 2019, Machine learning models often excel in the accuracy of their predictions but are opaque. due to their non linear and non parametric structure This makes statistical inference. challenging and disqualifies them from many applications where model interpretability is. crucial This paper proposes the Shapley regression framework as an approach for sta. tistical inference on non linear or non parametric models Inference is performed based. on the Shapley value decomposition of a model a pay off concept from cooperative game. theory I show that universal approximators from machine learning are estimation consis. tent and introduce hypothesis tests for individual variable contributions model bias and. parametric functional forms The inference properties of state of the art machine learning. models like artificial neural networks support vector machines and random forests are. investigated using numerical simulations and real world data The proposed framework. is unique in the sense that it is identical to the conventional case of statistical inference. on a linear model if the model is linear in parameters This makes it a well motivated. extension to more general models and strengthens the case for the use of machine learning. to inform decisions,JEL codes C45 C52 C71 E47, Keywords machine learning statistical inference Shapley values numerical simula. tions macroeconomics time series, Disclaimer The views expressed in this work are not necessarily those of the Bank of England or one of its.
committees This paper has also been published as Bank of England Staff Working Paper 784 Copyright Bank. of England 2019 All errors are mine Contact Bank of England Advanced Analytics Division Threadnee. dle Street London EC2R 8AH UK and Data Analytics for Finance and Macro DAFM Research Centre. King s College London Strand London WC2R 2LS UK Email andreas joseph bankofengland co uk Ac. knowledgement Many thanks to Sinem Hacioglu George Kapetanios Mingli Chen Milan Nedeljkovic David. Bholat David Bradnum Arthur Turrell Paul Robinson Ryland Thomas Evans Munro Marcus Buckmann. Miao Kang Luisa Pires Al Firrell Aidan Saggers Sian Besley Mike Joyce Jialin Chen and the participants of. the EC 2 conference Big Data Econometrics with Applications Rome 13 14 Dec 2018 who provided crucial. feedback and support at various stages of the project Open source All code and data used in this paper as. well as supplementary results are available on Github com Bank of England Shapley regressions. 1 Introduction, Model families from machine learning like support vector machines tree ensembles and artifi. cial neural networks often excel in the accuracy of their predictions Fernandez Delgado 2014. but are opaque due to their complex structure More generally many models make a trade off. between simplicity and accuracy 1 Accuracy provides confidence that a model s predictions are. close to actual outcomes while simplicity facilitates understanding and communication On a. technical level this usually boils down to a statistical inference analysis e g the estimation of. a coefficient associated with a variable in the model and its confidence levels with respect to a. hypothesis mostly the null This approach is largely limited to linear parametric models or. generalised linear models Greene 2017, On the other hand machine learning models are mostly non parametric built around pro. ducing accurate predictions Friedman et al 2009 For example artificial neural networks. which are driving current advances in artificial intelligence in the form of deep learning Good. fellow et al 2016 have long been known to have universal approximator properties Portnoy. 1988 2 They can approximate almost any unknown function given enough training data. However this directly leads to the black box critique of machine learning models because it. is not straightforward to understand a model s input output relations or perform a statistical. inference analysis on them This causes not only practical obstacles for their application but. also ethical and safety concerns more generally which are increasingly reflected in legal and. regulatory frameworks European Union 2016, Despite these important concerns machine learning models could provide substantial benefits. in the context of prediction policy problems Kleinberg et al 2015 These are situations. where the precise prediction of outcomes is important to inform decisions 3 Examples include. the forecasting of economic developments Garcia et al 2017 modelling the soundness of. financial institutions Chakraborty and Joseph 2017 consumer credit scoring Fuster et al. 2017 policy targeting based on uncertain outcomes Andini et al 2017 the prediction of. extreme weather events in the face of climate change Racah et al 2016 medical image anal. ysis and diagnosis Litjens et al 2017 or aiding expert judgement Kleinberg et al 2018. Institutional transparency is an additional aspect from a public policy point of view such as. the decision processes of central banks regulators and governments Bernanke 2010 On the. one hand decision makers need to understand the driving factors of the quantitative models. There is also an active area of research into simple but accurate models e g via the use of decision heuristics. or fast and frugal trees see for example Aikman et al 2014 S ims ek and Buckmann 2015. This property also applies to other non parametric models often used in machine learning see e g Scornet. et al 2014 Christmann and Steinwart 2008, As soon as we need to consider the change in outcome due to any action taken as a response to a prediction. we enter the area of a causal inference or mixed policy problem. they rely on and on the other hand also be able to communicate them clearly Again the. opaqueness of machine learning models hinders their application with regard to both points. Finally the need for machine learning models is likely to be aggravated by the current prolif. eration of large and granular data sources For instance data from social media smart phone. usage ubiquitous sensors or the internet of things may allow for the modelling of human. behaviour or the dynamics of autonomous machines in complex environments on an unprece. dented level Such capabilities may provide large benefits for technological advancement or. societal development more generally Again a detailed understanding of the deployed models. will be needed to fully utilise this potential, Two approaches to address the interpretability issue of machine learning models4 are variable.
attributions via the decomposition of individual predictions local attribution and importance. scores for the model as a whole global attribution A well motivated local decomposition. is provided by model Shapley values Strumbelj and Kononenko 2010 Lundberg and Lee. 2017 a pay off concept from cooperative game theory Shapley 1953 Young 1985 It. maps the marginal contribution coming from a variable within a set of variables to individual. model predictions However model decomposition is only one part of model interpretability. An equally important part is statistical inference in the form of hypothesis testing to assess the. confidence we can have in specific model outputs, This paper proposes a general statistical inference framework for non parametric models based. on the Shapley decomposition of a model namely Shapley regressions This framework trans. fers the model inference problem into a locally linear space This simultaneously opens the. toolbox of econometrics or parametric statistics more generally to machine learning and vice. versa Model inference consists of three steps First model calibration and fitting training. Second model testing and Shapley value decomposition on a hold out dataset Finally infer. ence based on a surrogate regression analysis using the Shapley decomposition as its inputs For. the known case of a linear model this approach reduces to the standard least squares case 5 In. this sense Shapley regressions can be seen as a natural extension of regression based inference. to the general non linear model The main distinction is that inference is often only valid on. a local level i e within a region of the input space due to the potential non linearity of the. model plane A consequence of this is that the concept of a regression coefficient as a standard. way of measuring and communicating effects is not directly applicable I propose a generalised. coefficient concept suited for the non linear case which is close to its linear parent It allows. for similar assessment and communication of modelling results On a deeper level the current. I only discuss supervised learning in this paper However the proposed methodology can be applied more. generally in situations where a model delivers a score which needs to be evaluated based on its inputs. Shapley values have been used in linear regression analysis before to address collinearity issues Lipovetsky. and Conklin 2001 I do not see scope for confusion with the current application. work builds on seminal work in non parametric statistics Stone 1977 and connects it with. recent developments in the machine learning literature. The remainder of this paper is structured as follows Section 2 discusses model interpretability. more widely and the nascent literature on statistical inference using machine learning models. in econometrics Section 3 introduces the concept of Shapley values and Shapley regressions. for model inference A slightly modified null hypothesis is introduced to test the statistical. significance of variables in a model Shapley share coefficients are defined as a summarising. concept to assess individual variable contributions akin to linear regression coefficients In Sec. tion 4 the theoretical estimator properties of machine learning model are investigated General. estimation consistency is shown for the large class of piecewise analytic functions I present a. test to assess model bias for more general model decompositions which are based on Shapley. values and introduce robust component estimates The validity conditions for inference within. the Shapley regression framework are stated Particularly valid asymptotic inference depends. on sample splitting for model training and testing which is a common procedure when building. a machine learning system Section 5 considers applications First empirical estimation prop. erties of commonly used machine learning models like artificial neural networks NN support. vector machines SVM and random forests RF are investigated using numerical simulations. Second the Shapley regression framework is applied to modelling long run macroeconomic time. series for the UK and US Machine learning models are mostly more accurate than either regu. larised biased and unbiased linear benchmark models Inference from the Shapley regression. framework is robust against model choice and richer than that of benchmark models pointing. to the importance of non linearities in modelling these data generating processes Differences. in results are in line with analytical model properties and can be used for model selection. The main drawback of using the Shapley regressions framework is the computational cost of. calculating Shapley value decompositions Depending on the application this can be addressed. via appropriate approximations or sampling procedures Section 6 concludes. An inference recipe for machine learning models is summarised in Box 1 in the Appendix to. gether with figures tables and proofs of theoretical results The code and data for the numerical. and empirical analyses alongside supplementary results are available on Github com Bank of. England Shapley regressions,2 Literature, Approaches to interpretable machine learning come from different directions General issues. around model interpretability technical approaches from within machine learning research and. approaches from econometrics and statistics I will primarily focus on the latter two. The highest level of discussion relates to reasons why models should be interpretable and well. communicated despite good comparative numerical performance Especially in the context. of informing decisions these are intertwined ethical safety privacy and increasingly legal con. cerns about the application of opaque models Crawford 2013 European Union 2016 Fuster. et al 2017 Lipton 2016 discusses desirables properties of interpretable research in general. trust causality transferability informativeness and models we use transparency e g via. local decomposability and interpretability e g via visualisations and relatedness He argues. that a complex machine learning model does not need to be less interpretable than a simpler. linear model if the latter operates on a more complex space This is in line with Miller 2017. who provides a comprehensive discussion of explainable artificial intelligence often referred to. as XAI from a social science perspective One take away message is that humans prefer simple. explanations i e those citing fewer causes and explaining more general events are generally. preferred though they may be biased Shallow tree models from machine learning or derived. fast and frugal trees may thus offer accurate models while also providing satisfactory trans. parency Aikman et al 2014 S ims ek and Buckmann 2015. Approaches in computer science have focused on model decomposition by means of variables. attribution techniques That is scores of importance are given to each input variable for single. observations or the full model Gini importance for tree based classifiers is an example of a. model score It is a measure for how much a variable contributes to the optimisation of the. objective function Kazemitabar et al 2017 Friedman et al 2009 Local attributions de. compose individual predictions assigning scores to each input variable Here one approach is. to construct approximate surrogate models which allow for model decomposition Examples. are LIME6 Ribeiro et al 2016 DeepLIFT7 Shrikumar et al 2017 and Shapley values. Strumbelj and Kononenko 2010 Lundberg and Lee 2017 demonstrate that Shapley values. offer a unified framework of previous attribution schemes with appealing properties These are. also the reason for their use in the current paper. The literature of inference using machine learning models from an econometrics point of view is. just at its beginning and also the main area this paper talks to I distinguish three approaches. First one can construct a correspondence between an econometric and a machine learning. model where possible Mullainathan and Spiess 2017 present the simple but intriguing idea. to treat a not too deep tree model as a regression model with multiple interaction terms one. per leaf node Similar to the tree model overfitting is an emerging issue This can be addressed. via regularisation and the estimation of unbiased coefficient on the regularised model corre. sponding to a pruned tree when shrinking coefficients to zero. Local Interpretable Model agnostic Explanations,Deep Learning Important FeaTures for NN. The second approach is double or debiased machine learning Chernozhukov et al 2018 It. deals with the issue of parameter regularisation bias using machine learning e g when es. timating a partially linear model in the presence of a high dimensional nuisance parameter. This bias is avoided via the construction of orthogonal score functions for the estimation of. a low dimensional target parameter The procedure is model independent and allows for the. well defined inference on causal parameters The main difference to the current paper is that I. do not allow parameters of interest to be part of the model optimisation stage but rather recover. those from an a posteriori decomposition which may involve a particular parametric form or not. A third approach has been to use a priori modified models which have well defined statistical. properties e g for the estimation of treatment effects Wager and Athey 2018 introduce. a type of RF for the estimation of heterogeneous treatment effects The idea is based on the. notion that small enough leaf nodes provide uncorrelated sub samples as though they had come. from a randomised experiment Intuitively trees in a forest act as a form of matching algorithm. which is more flexible than conventional techniques due to the adaptive nature of tree models. For the construction of these causal forests they introduce the concept of honest trees as a. modification of the original algorithm These now have an asymptotically Gaussian and centred. sampling distribution The idea of using specific characteristics of machine learning models to. improve on existing techniques is again intriguing The present paper is complimentary to this. approach RF from honest trees are still open to the black box critique which can be addressed. by the Shapley regression framework 8,3 The Shapley regression framework. 3 1 Notation and definitions, This paper considers the common case where f x D Rm 7 Rp is the data generating.
process DGP of interest with domain D We only consider the case p 1 the extension to. p 1 is straightforward The data x Rn m with m being the number of features or inde. pendent variables and n the number of observations Features are assumed to be independent. from each other while observations need not be column wise independence Consequences of. this restriction and ways to address it if too stringent will be discussed. The vector Rm 1 describes the parameterisation of the DGP such as the set of coefficients. of a linear model with 0 being the intercept The parameters represent the effects we are. interested in studying The DGP f is assumed to be piecewise continuous and differentiable. The same applies with respect to Chernozhukov et al 2018 meaning the current paper is not a substitute. but a complement to preceding work, on finite sub domains of D and to have finite moments i e ED f d with d 1 Regions. within D are labelled D, The non parametric model is f x D Rm 7 Rq with Rq where q as m. is allowed It represents our machine learning models of interest such as NN SVM or RF. In these cases represents the network weights support vector coefficients and split points. respectively The model parameters are slightly different to their usage in semi parametric. statistics where often describes a high dimensional nuisance parameter which may be present. or not The model f is assumed to have finite moments but no other regularity conditions are. imposed The linear model is parameterised by, The used index convention is that i j 1 n refer to individual observations and k l. 1 m to feature dimensions No index refers to the whole dataset x Rn m An index. c 1 C refers to components of linear decompositions of either a DGP or a model e g. c 1 c refers to the Shapley decomposition of a model see below Super. scripts S refer to Shapley related quantities which will be clear from the context Estimated. quantities are hatted except for simplicity,3 2 The linear model as a guiding principle. Statistical inference can be local or global The linear model f xi xi m. k 0 xik k is special, in the sense that it provides local and global inference at the same time The coefficients.
describe local effects via the sum of the product of variable components and coefficients At. the same time the coefficient vector determines the orientation of the global model plane. with constant slope in each direction of the input space As long as the number of co variates. in a model is modest the linear model is widely accepted to provide good inference properties. and is the workhorse of econometric analysis, The linear model belongs to the class of additive variable attributions For an observation. xi Rm we define the model decomposition as,f xi 0 k xi 0 xik k 1. where 0 0 is the intercept The standard approach to test for the importance of a certain. variable is to test against the null hypothesis H0k k 0 The goal of this paper is to arrive. at a similar hypothesis test valid for more general models f. 3 3 Shapley values, A more general class of additive attribution is given by model Shapley values S a pay off. concept from cooperative game theory Shapley 1953 Making the analogy between players. of a multi player game cooperating to generate a pay off and variables xk within a model to. generate predictions f x the marginal contribution from variable k is defined in the form of. its Shapley value Strumbelj and Kononenko 2010,X x0 n x0 1 0. Sk f x f x k f x0, where C x k is the set of all possible coalitions of m 1 model variables when excluding.
the k th variable x0 denotes the number of included variables Eq 2 is the weighted sum. of marginal contributions of variable k accounting for the number of possible coalitions for a. certain x0 9, Intuitively the above definition of a Shapley value is similar to the regression anatomy of a. coefficient k i e the bivariate slope coefficient after partialling out all other regressors in a. multi variate model Angrist and Pischke 2008 This will be formalised below. Shapley values are the unique class of additive value attribution with the following properties. Shapley 1953 Young 1985 Strumbelj and Kononenko 2010. Property 1 Efficiency The attribution model S matches the original model f at xi. xi S0 Sk xi f xi 3, In a modelling context this property is called local accuracy A model s Shapley decomposition. always sums to the predicted value The intercept 0 is the expected or average model value. Property 2 Missingness null player If a variable is missing from a model no attribution. is given to it i e Sk 0 dummy player, Property 3 Symmetry If k and k 0 are two variables which are equivalent such that. f x0 j f x0 k 4, for all possible x0 not containing j or k then Sj Sk. Property 4 Strong monotonicity Variable attributions do not decrease if an input s. contribution to a model increases or stays the same regardless of other variables in the model. That is for any two models f and f 0 if,f x0 f x0 k f 0 x0 f 0 x0 k 5.
for all possible x0 then Sk f x Sk f 0 x x0 k indicates the set of variables excluding k. In the context of variable attribution this property is also called attribution consistency It is. For example assuming we have three players variables A B C the Shapley value of player C would be. SC f 1 3 f A B C f A B 1 6 f A C f A 1 6 f B C f B 1 3 f C. an innovation to previous approaches such Gini importance of decision trees Lundberg et al. Property 5 Linearity For any two independent models f and f 0 i e where the outcome of. the one does not depend on the inputs or outcome of the other the joint Shapley decomposition. for a variable k can be written as,Sk a f f 0 a Sk f a Sk f 0. for any real number a A consequence of these properties is the following proposition 10. Proposition 3 1 The Shapley decomposition S of a model f linear in parameters f x. x is the model itself The proof is given in the Appendix. Hence the Shapley value decomposition of the linear model is well known. Regarding the computation of model Shapley values 2 most models cannot handle missing. variables to evaluate variable coalitions If missing from a coalition the contribution of a. variable is integrated out via conditional expectations relative to a representative background. sample Particularly we evaluate Ex C f x xC where C is the set of non missing variables. in a coalition For this to be exact one has to assume feature independence to avoid model. evaluations at unreasonable inputs This can be a strong assumption for many applications I. will demonstrate a way to quantify errors made based on this assumption in Section 5 2 4. The computation of Shapley decompositions is challenging due to the exponential complexity. of 2 Two approaches have been proposed in the machine learning literature which preserve. the properties of Shapley values Shapley sampling values Strumbelj and Kononenko 2010. and Shapley additive explanations SHAP Lundberg and Lee 2017 The latter provides an. improvement on the former and will be the basis for the calculation of Shapley decompositions. in this paper The background dataset is taken to be the training set of a model which contains. the information the model parameters are based on from the optimisation process The cal. culation of model Shapley values is probably the biggest drawback in their usage Appropriate. approximations or sampling procedures may be used and tested depending on the situation 11. 3 4 Shapley regressions, Having a well defined measure for variable attributions we next turn to hypothesis testing e g. to assess the significance of individual variable contributions For this one can reformulate an. inference problem in terms of a model s Shapley decomposition That is one estimates the. This corresponds to linear Shap in Lundberg and Lee 2017. For high dimensional data such as images or text it is often more practical and intuitive to work with. lower dimensional representations such as super pixels objects or topics respectively. Shapley regression,yi Si S i Sk f xi kS i 7, where k 0 corresponds to the intercept and i N 0 2 The surrogate coefficients kS are. tested against the null hypothesis,H0k kS 0 8, The key difference to the linear case is the regional dependence on i e only local statements. about the significance of variable contributions can be made This is related to the potential. non linearity of a model whose hyperplane in the input target space may be curved compared. to that of the linear model 1, The following proposition provides further justification for the use of Shapley regressions for.
inference on machine learning models, Proposition 3 2 The Shapley regression problem of Eq 7 for a model f linear in parameters. is identical to the least square problem related to f x x i e S 1 The proof is given. in the Appendix, Proposition 3 2 provides practical guidelines and intuition regarding the the coefficients kS. Geometrically they describe the alignment of the model hyperplane spanned by the Shapley. decomposition and the target variable in the same way as the coefficients of a linear model in. the original input space Notionally this is not different from a variables transformation One. expects coefficient values of unity i e S 1 if the machine learning model generalises well 12. Deviations from unity are caused by the best fit hyperplane being tilted in certain directions. and provide insight about the generalisation properties of the model Values greater than unity. indicate that f underestimates the effect of a variable Values smaller than one indicate the. opposite Particularly statistical significance will drop as kS approaches zero as there is no clear. alignment between Shapley components Sk and the target y We reject negative coefficients. as they are opposed to the alignments of attributed effects S These can occur when f is not. a good fit itself, Having derived a test against the null hypothesis it is not yet clear how to communicate. inference results The coefficients S are only partially informative as they to not quantify. the components of S but rather their alignment with the target independent of their actual. magnitude I propose the following generalised coefficient. A formal derivation of this statement is given in the next section. 3 4 1 Shapley share coefficients, The Shapley share coefficients SSC of variable xk in the Shapley regression framework is. defined as,sign 1 1 9,f x x xk hxk i,l 1 k xl hxl i k.
where h i k stands for the average over xk in k R The SSC Sk f is a summary statistic. for the contribution of xk to the model over a region Rm. It consist of three parts The first is the sign which is the sign of the corresponding linear model. The motivation for this is to indicate alignment of a variable with the target The second part. is coefficient size It is defined as the fraction of absolute variable attribution allotted to xk. across the range of x considered The sum of absolute value of SSC is one by construction 13. It measures how much of the model output is explained by xk The last component is used. to indicate the significance level of Shapley attributions from xk against the null hypothesis 8. and thus the confidence one can have in information derived from that variable. Eq 10 provides the explicit form for the linear model The main difference to the conventional. case is a normalising factor accounting for localised properties of non linear models Given the. definition over a range xk it is important to also interpret them in this context For example. contributions may vary over the input space such that kS takes on difference values at different. points or times, More generally a coefficient is a constant factor multiplying some quantity of interest This is. a concept from linear models which does not directly translate to the non linear case Eq 9 is. constructed in such a way to provide comparable information and structure A key property and. further difference to the linear case of this generalisation is that 9 does not make assumptions. about the functional form of the DGP hence it may be called a non parametric coefficient. 3 4 2 SSC standard errors, Given the conditions we required from f the classical central limit theorem applies to the. sampling distribution of Shapley values S f tending to a multivariate normal distribution. This can be used to construct standard errors and confidence intervals for E Sk However the. information derived from this may be hard to interpret given the lack of a scale in components. Sk Not so for the SSC 9 which are normalised, The normalisation is not needed in binary classification problems where the model output is a probability. Here the a Shapley contribution relative to a base rate can be interpreted as the expected change in probability. due to that variable, Let k Sk f 0 1 be the absolute value of the k th SSC The upper bounds on the. variance of k and its sampling standard error of the mean are given by14. var k k 1 k k se k p 11, The sampling distribution of k will also approach a Gaussian with increasing sample size.
Thus provides a well defined measure of the variability of S within We have. now assembled tools for statistical inference on machine learning models regarding the direc. tion size significance and variability of variable contributions Next I provide the theoretical. underpinning of the proposed framework,4 Machine learning estimator properties. Focusing on regression problems15 it is common to minimise the mean squared error MSE. between a target y from a DGP f and a model f over a dataset x The expected MSE. can be decomposed as,Ex y f f Ex f f Ex f 2,bias2 variance. where 2 is the irreducible error component of the DGP corresponding to the variance of y. Eq 12 distinguishes between external model parameters and internal parameters of the. DGP This separation is important because machine learning models are often subject to reg. ularisation as part of cross validation procedures model calibration and the training process. This directly affects the model parameters if present when minimising 12 Thus if. would explicitly be part of the training process its values would be biased as was investigated. in Chernozhukov et al 2018 It is at the heart of machine learning to generalise to from. y x by the means of This generalisation can made explicit i e by recovering using. Shapley values and regressions,4 1 Estimator consistency. Statistical inference on machine learning models requires two steps First the control of bias. and variance according to 12 and second the extraction of and inference on Regarding the. One will generally be interested in the expected explanatory fraction k of a variable while the sign of the. SSC is fixed Accounting for the sign the bound on the RHS of 11 needs to be multiplied by four. A regression model in machine learning refers to fitting a continuous target or dependent variable Problems. which describe categorical variables e g a binary target are called classification problems All results presented. here can be applied in the classification case, former most non parametric estimators for regression problems are consistent in the sense that. the squared error tends towards 2 as the training data size tends to infinity This property. can be called error consistency that is,p lim y f x 0.
i e that expected divergence of f from the true value y converges towards zero in probability as. the sample size increases assuming 0 Eq 13 defines the universal approximator property. of machine learning models It is ultimately based on the consistency of non parametric regres. sions according to Stone 1977 16 That is universally consistent machine learning models can. be interpreted as generating a local weight distributions which mimic the DGP. This does not necessarily imply estimator consistency 17 i e if. p lim k k x 0 14, That is if universal approximators learn the correct parameterisation of a DGP. In many applications of interest f can be locally approximated by a polynomial regression. For a polynomial DGP the following result holds, Theorem 4 1 polynomial consistency of machine learning estimator Let f be a DGP of. the form f x m d d, k 0 k pk x Pk x where p x is a polynomial of order d of the input. features on a subspace x and D Rm If for each x0 a model f is error. consistent then the estimator is also estimator consistent in the sense of 14 as long as. f does not explicitly depend on The proof is given in the Appendix. Theorem 4 1 can be used to make a more general statement about non linear parameter de. pendencies, Corollary 1 universal consistency Let f be a DGP on D Rm and f a model. not involving If f can be approximated by a polynomial f p 0 arbitrarily close and f is error. consistent then f is estimator consistent for any f Particularly the effect is locally. approximated by f p 0 arbitrarily precise The proof is given in the Appendix. Corollary 1 tells us that an error consistent model will learn most functional forms of interest. and their true parameters provided sufficient data 18 This property can be called implicit. estimation consistency where we do not allow parameters of interest to enter the estimation. Particularly Proposition 5 on page 609, The term consistency carries three different meanings in this paper Namely consistency of model vari.
able attributions e g Shapley values error consistency for universal approximator in machine learning and. estimator consistency with respect to see also Zhao and Yu 2006 Munro 2018. An intuitive illustration of how a SVM with radial kernel can approximate almost any function is given in. stage The Shapley decomposition can now be used make the functional form explicit and to. test parameterisations of the DGP,4 2 Estimator bias. When has a model sufficiently converged for well informed inference e g judged by its Shapley. share coefficients from Eq 9 Before addressing this question let us connect model Shapley. values to parametric functional forms which have a finite decomposition. Lemma 1 model decomposition There exists a decomposition S x C. f x if the equation x S is solvable for each xk with k 1 m The proof is. simple as x S can be used to construct, A decomposition can be called an additive functional form representation of f with the. Shapley decomposition S being the trivial representation That is is a parameterisation of. f for which the following results holds,Theorem 4 2 composition bias Let f be a DGP and x. c 1 c x f x the, true local decomposition of f Let f be an error consistent model according to Theorem 4 1. with a local decomposition x C,c 1 c x f x e g its Shapley decompositions 2.
Applying the Shapley regression 7 is unbiased with respect to if and only if cS 1. c 1 C Particularly there exists a minimal mu for which c 1 c 1 C at. a chosen confidence level The proof is given in the Appendix. Theorem 4 2 implies that S 1 as m for either S or Having S the mapping. S 7 can be used to test the functional form of f Corollary 1 extends this to local. approximations of any form i e for those to which Lemma 1 does not apply but a local. decomposition can be formulated For example universal approximators will learn regression. discontinuities Imbens and Lemieux 2008 as a results of treatment when given enough. data The Shapley regression framework can then be used to construct approximate parametric. functional forms around a discontinuity and test the limits of their validity. For a linear model c 1 is nothing else as the unbiasedness of coefficients if the model is. well specified This can be seen from Proposition 3 2 and shows again that Shapley regressions. reduce to the standard case in this situation, For a general non linear model unbiasedness can only be assessed if c 1 c 1 C due. to the accuracy condition 3 required from each decomposition The Shapley regression 7. tests linear alignment of with the dependent variable while the level of individual components. c may shift until cS 1 c 1 C for sample sizes smaller than nu Consistency implies. that such a shift happens towards the true level c.


Related Books

Predicting the Present with Bayesian Structural Time Series

Predicting the Present with Bayesian Structural Time Series

Predicting the Present with Bayesian Structural Time Series Steven L. Scott Hal Varian June 28, 2013 Abstract This article describes a system for short term forecasting based on an ensemble prediction

The Concept of International Law in the Jurisprudence of H ...

The Concept of International Law in the Jurisprudence of H

The Concept of International Law in the Jurisprudence of H.L.A. Hart 969. a few preliminary remarks on the relevance of a jurisprudential encounter with inter-

Predictive Modeling in Healthcare: Where We Are and What ...

Predictive Modeling in Healthcare Where We Are and What

Predictive Modeling in Healthcare: Where We Are and What the Future Holds Jonathan P. Weiner, DrPH Professor of Health Policy & Management and of Health Informatics. The Johns Hopkins Bloomberg School of Public Health &The Johns Hopkins School of Medicine. jweiner@jhsph.edu, 410 955-5661. Keynote presented at the National Predictive Modeling Summit

Session Hart, selections from The Concept of Law

Session Hart selections from The Concept of Law

1 Session 3 Hart, selections from The Concept of Law. Hart, like Austin, wants to retain the central tenet of legal positivism: the separation of the

Samsung GUSTO 3 - Verizon Wireless

Samsung GUSTO 3 Verizon Wireless

GH68-40578A Printed in China MOBILE PHONE User Manual Please read this manual before operating your phone and keep it for future reference.

Boundless by CSMA AutoSOLO, Autotest and Production Car ...

Boundless by CSMA AutoSOLO Autotest and Production Car

1. Boundless by CSMA Club will organise a National B & Clubman's Autosolo, a Clubmans Autotest and a Clubman's Production Car Autotest (PCA) on Sunday 24th June 2018 at: Poplar 2000 Motorway Services, Lymm, M6 J20/ M56 J9 Food available on site! 2. The meeting will be held under the General Regulations of The Motor Sports Association Ltd

convenio de limpieza de catalunya 2010 2013 - cecot.org

convenio de limpieza de catalunya 2010 2013 cecot org

Convenio Colectivo del Sector de Limpieza de ~ Edificios y Locales de Catalunya. 2010-2013

REPLACEMENT PARTS

REPLACEMENT PARTS

design, making Welch Allyn the smart choice for superior performance, value, and service. 118 Intubating Fiberscopes Our Intubating Fiberscope combines industry-leading features, superior optics, and the highest quality lighting, providing the perfect instrument for facilitating challenging intubations. Feel confident with the FL-100 at your side.

The Planning and Coordination Process to Develop and ...

The Planning and Coordination Process to Develop and

The Planning and Coordination Process to Develop and Implement a National Drought Management Plan in Morocco National Consultancy Assignment Technical Advisory Service for Developing and Implementing Mitigation and Preparedness Drought Management Plans in Pilot Project Countries National Consultant: Yasmina Imani Morocco August 2014

The Catcher in the Rye - pps.net

The Catcher in the Rye pps net

and protective of J.D. Salinger. We hope that this result, full of opportunities to write from a variety of viewpoints, role-play, read critically, act, and draw will not only lead to the enhanced ability to analyze text and craft a coherent essay but also the maturity needed to reflect on the universality of the teen experience.