
FraunhoferITWMP.RuckdeschelOptimally Robust Kalman FilteringBerichte des Fraunhofer ITWM,Nr.185 (2010)
P. Ruckdeschel Optimally Robust Kalman Filtering Berichte des Fraunhofer ITWM, Nr. 185 (2010)

@Fraunhofer-InstitutfurTechno-undWirtschaftsmathematikITWM2010ISSN 1434-9973Bericht 185 (2010)Alle Rechte vorbehalten. Ohne ausdruckliche schriftliche Genehmigung desHerausgebers ist es nicht gestattet, das Buch oder Teile daraus in irgendeineForm durch Fotokopie, Mikrofilm oder andere Verfahren zu reproduzierenoder in einefurMaschinen, insbesondere Datenverarbeitungsanlagen, ver-wendbare Sprache zu Gbertragen. Dasselbe gilt far das Recht der offentlichenWiedergabe.Warennamen werdenohneGewahrleistungderfreienVerwendbarkeitbenutzt.Die Veroffentlichungen in der Berichtsreihe des Fraunhofer ITWMkonnenbezogenwerdenaberFraunhofer-Institut fur Techno- undWirtschaftsmathematikITWMFraunhofer-Platz 167663 KaiserslauternGermanyTelefon:+49(0)631/31600-0+49(0)631/31600-1099Telefax:E-Mail:info@itwm.fraunhofer.deInternet:www.itwm.fraunhofer.de
© Fraunhofer-Institut für Techno- und Wirtschaftsmathematik ITWM 2010 ISSN 1434-9973 Bericht 185 (2010) Alle Rechte vorbehalten. Ohne ausdrückliche schriftliche Genehmigung des Herausgebers ist es nicht gestattet, das Buch oder Teile daraus in irgendeiner Form durch Fotokopie, Mikrofilm oder andere Verfahren zu reproduzieren oder in eine für Maschinen, insbesondere Datenverarbeitungsanlagen, verwendbare Sprache zu übertragen. Dasselbe gilt für das Recht der öffentlichen Wiedergabe. Warennamen werden ohne Gewährleistung der freien Verwendbarkeit benutzt. Die Veröffentlichungen in der Berichtsreihe des Fraunhofer ITWM können bezogen werden über: Fraunhofer-Institut für Techno- und Wirtschaftsmathematik ITWM Fraunhofer-Platz 1 67663 Kaiserslautern Germany Telefon: +49(0)631/31600-0 Telefax: +49(0)631/31600-1099 E-Mail: info@itwm.fraunhofer.de Internet: www.itwm.fraunhofer.de

VorwortDasTatigkeitsfelddesFraunhofer-lnstitutsfurTechno-undWirtschaftsmathematikITWM umfasst anwendungsnahe Grundlagenforschung,angewandte ForschungsowieBeratung undkundenspezifischeLosungenaufallenGebieten,diefurTechno-undWirtschaftsmathematikbedeutsamsindIn derReiheBerichtedes Fraunhofer ITWMusolldieArbeit des Institutskontinu-ierlicheinerinteressiertenOffentlichkeitinIndustrie,WirtschaftundWissenschaftvorgestellt werden.Durch dieengeVerzahnung mit demFachbereichMathematik der Universitat Kaiserslautern sowie durch zahlreiche Kooperationen mit inter-nationalen Institutionen und Hochschulen in den Bereichen Ausbildung und For-schung ist ein groBes Potenzial fur Forschungsberichte vorhanden. In die Bericht-reihewerden sowohlhervorragendeDiplom-undProjektarbeitenund Disserta-tionen als auch Forschungsberichte der Institutsmitarbeiter und Institutsgaste zuaktuellenFragenderTechno-undWirtschaftsmathematikaufgenommenDaruber hinaus bietet die Reihe ein Forum fur die Berichterstattung uber die zahl-reichenKooperationsprojektedes Instituts mit Partnern aus Industrie undWirt-schaft.Berichterstattung heiBt hier Dokumentation des Transfers aktueller Ergebnisse ausmathematischerForschungs-undEntwicklungsarbeit inindustrielleAnwendungenund Softwareprodukte-und umgekehrt, denn Probleme der Praxis generierenneueinteressantemathematischeFragestellungenh/Prof.Dr.DieterPratzel-WoltersInstitutsleiterKaiserslautern,imJuni2001
Vorwort Das Tätigkeitsfeld des Fraunhofer-Instituts für Techno- und Wirtschaftsmathematik ITWM umfasst anwendungsnahe Grundlagenforschung, angewandte Forschung sowie Beratung und kundenspezifische Lösungen auf allen Gebieten, die für Techno- und Wirtschaftsmathematik bedeutsam sind. In der Reihe »Berichte des Fraunhofer ITWM« soll die Arbeit des Instituts kontinuierlich einer interessierten Öffentlichkeit in Industrie, Wirtschaft und Wissenschaft vorgestellt werden. Durch die enge Verzahnung mit dem Fachbereich Mathematik der Universität Kaiserslautern sowie durch zahlreiche Kooperationen mit internationalen Institutionen und Hochschulen in den Bereichen Ausbildung und Forschung ist ein großes Potenzial für Forschungsberichte vorhanden. In die Berichtreihe werden sowohl hervorragende Diplom- und Projektarbeiten und Dissertationen als auch Forschungsberichte der Institutsmitarbeiter und Institutsgäste zu aktuellen Fragen der Techno- und Wirtschaftsmathematik aufgenommen. Darüber hinaus bietet die Reihe ein Forum für die Berichterstattung über die zahlreichen Kooperationsprojekte des Instituts mit Partnern aus Industrie und Wirtschaft. Berichterstattung heißt hier Dokumentation des Transfers aktueller Ergebnisse aus mathematischer Forschungs- und Entwicklungsarbeit in industrielle Anwendungen und Softwareprodukte – und umgekehrt, denn Probleme der Praxis generieren neue interessante mathematische Fragestellungen. Prof. Dr. Dieter Prätzel-Wolters Institutsleiter Kaiserslautern, im Juni 2001

FraunhoferITWMOptimally Robust Kalman FilteringPeterRuckdeschelMay 6,2010
Optimally Robust Kalman Filtering Peter Ruckdeschel May 6, 2010

AbstractWe present some optimality results for robust Kalman filtering.To this end, we introduce the general setup of state space models which will not be limited to aEuclidean ortime-discrete framework. Wepose the problem of statereconstruction and repeatthe classical existing algorithms in this context. We then extend the ideal-model setup allowing foroutliers which in this context may be system-endogenous or -exogenous, inducing the somewhatconflictinggoalsoftrackingandattenuationIn quite a general framework, we solve corresponding minimax MSE-problems for both types ofoutliers separately,resulting in saddle-points consisting of an optimally-robustprocedure andacorresponding least favorableoutliersituation.Stll insisting on recursivity, we obtain an operational solution, the rLs filter and variants of it.Exactly robust-optimal filters would need knowledge of certain hard-to-compute conditional meansin the ideal model; things would be much easier if these conditional means were linear.Hence, itis important to quantify the deviation of the exact conditional mean from linearity. We obtain asomewhat surprising characterization of linearityfor theconditional expectation inthis settingCombining both optimal filter types (for system-endogenous and -exogenous situation) we comeup with a delayed hybrid filter which is able to treat both types of outliers simultaneously.Keywords:robustness,KalmanFilter,innovationoutlier,additiveoutlier
Abstract We present some optimality results for robust Kalman filtering. To this end, we introduce the general setup of state space models which will not be limited to a Euclidean or time-discrete framework. We pose the problem of state reconstruction and repeat the classical existing algorithms in this context. We then extend the ideal-model setup allowing for outliers which in this context may be system-endogenous or -exogenous, inducing the somewhat conflicting goals of tracking and attenuation. In quite a general framework, we solve corresponding minimax MSE-problems for both types of outliers separately, resulting in saddle-points consisting of an optimally-robust procedure and a corresponding least favorable outlier situation. Still insisting on recursivity, we obtain an operational solution, the rLS filter and variants of it. Exactly robust-optimal filters would need knowledge of certain hard-to-compute conditional means in the ideal model; things would be much easier if these conditional means were linear. Hence, it is important to quantify the deviation of the exact conditional mean from linearity. We obtain a somewhat surprising characterization of linearity for the conditional expectation in this setting. Combining both optimal filter types (for system-endogenous and -exogenous situation) we come up with a delayed hybrid filter which is able to treat both types of outliers simultaneously. Keywords: robustness, Kalman Filter, innovation outlier, additive outlier

Introduction1IntroductionState space models are an extremely flexible model class for dynamic phenomena, and even moresoifweunderstandthemtoalsocomprisediscretestatespacesasusedinHiddenMarkovModels.Their applications range from Engineering Sciences, with Aeronautics, Electrical Engineering,speechrecognition,over automaticmonitoring/surveillance systems with important applications in inten-sive caremedicine,toGenetics, with applications in genesequencing,evolutional biology,andtoEnvironmetrics and Geo-Statistics, with applications e.g. in hydrology and over to econometrics andfinancewithapplications inpredictionof stockprices,option pricingandportfoliooptimization.A survey on applications in econometrics is given in Harvey (1987), for the other domains a shortsearchonthewebwillproduceanabundanceof references.AcomprehensiveoverviewofthemathematicalmethodsusedinthissubjectmaybefoundinChen(1996)Historically,afterpioneeringworkbyKolmogorov(1941a,b)Wiener(1949),still limitedtostation-ary situations, in two seminal papers, Kalman (1960) and Kalman and Bucy (1961), achieved abreakthrough,finding recursive, orthogonally optimal procedures which also covered non station-ary situations, nowknown as Kalman filter (in the time-discrete setting)and Kalman-Bucy filter (inthe continuous-time setting).1.1Reviewofthe literatureonRobustKalmanfilteringSoon in the history of Robust Statistics people became aware of the robustness problem inherentto Kalman filtering, with first (non-verified) hits on a quick search for "robust Kalman filter" onscholar.google.com as early 1962 and 1967, i.e.; the former even before the seminal Hube(1964) paper, often referred to as birthday of Robust Statistics.In themeantimethereis an evergrowing amountof literature onthistopicKassam andPoor(1985)havealreadycompiled asmanyas209referencestothatsubjectin1985...Excellentsurveysaregiven in Ershovand Lipster (1978),Kassam andPoor(1985),StockingerandDutter(1987)MartinandRaftery(1987),SchickandMitter(1994),Kunsch(2001).On the other hand, the mere notion of robustness itself is not understood unanimously in the liter-ature.Thenotionthatwewill useinthispaperwill focusqualitativelyonbounded riskonneigh.borhoods about an ideal model as specified in subsection 2.2, which in Problems (3.16), (3.17)will be made quantitative optimizing corresponding risks.Wealso emphasizethatworkingwith"small"neighborhoods,theminimaxformulationofProb-lem(3.16)will not result inoverlypessimisticprocedures,ortotakeupaformulation by C.Rogers,1
Introduction 1 Introduction State space models are an extremely flexible model class for dynamic phenomena, and even more so if we understand them to also comprise discrete state spaces as used in Hidden Markov Models. Their applications range from Engineering Sciences, with Aeronautics, Electrical Engineering, speech recognition, over automatic monitoring/surveillance systems with important applications in intensive care medicine, to Genetics, with applications in gene sequencing, evolutional biology, and to Environmetrics and Geo-Statistics, with applications e.g. in hydrology and over to econometrics and finance with applications in prediction of stock prices, option pricing and portfolio optimization. A survey on applications in econometrics is given in Harvey (1987), for the other domains a short search on the web will produce an abundance of references. A comprehensive overview of the mathematical methods used in this subject may be found in Chen (1996). Historically, after pioneering work by Kolmogorov (1941a,b), Wiener (1949), still limited to stationary situations, in two seminal papers, Kalman (1960) and Kalman and Bucy (1961), achieved a breakthrough, finding recursive, orthogonally optimal procedures which also covered non stationary situations, now known as Kalman filter (in the time-discrete setting) and Kalman-Bucy filter (in the continuous-time setting). 1.1 Review of the literature on Robust Kalman filtering Soon in the history of Robust Statistics people became aware of the robustness problem inherent to Kalman filtering, with first (non-verified) hits on a quick search for “robust Kalman filter” on scholar.google.com as early 1962 and 1967, i.e.; the former even before the seminal Huber (1964) paper, often referred to as birthday of Robust Statistics. In the meantime there is an ever growing amount of literature on this topic —Kassam and Poor (1985) have already compiled as many as 209 references to that subject in 1985. . . Excellent surveys are given in Ershov and Lipster (1978), Kassam and Poor (1985), Stockinger and Dutter (1987), Martin and Raftery (1987), Schick and Mitter (1994), Künsch (2001). On the other hand, the mere notion of robustness itself is not understood unanimously in the literature. The notion that we will use in this paper will focus qualitatively on bounded risk on neighborhoods about an ideal model as specified in subsection 2.2, which in Problems (3.16), (3.17) will be made quantitative optimizing corresponding risks. We also emphasize that working with “small” neighborhoods, the minimax formulation of Problem (3.16) will not result in overly pessimistic procedures, or to take up a formulation by C. Rogers, 1

Introductioncontrarytootherminimaxsettings,youwillleavethehouse-eveninthepresenceofubiguitousdangers,simply becauseyouonlylookat"realistic"dangers lying"close"toyourintended wayThe litmus test for ournotion ofrobustness inthis context willbewhetheracorrespondingfilterwillbe bounded in the observations, as otherwise the respective risk willbe unbounded on an arbitrarilysmall neighborhood.Thisqualitativenotionofrobustnessshouldbecomparedto"QualitativeRobustness"as introducedby Hampel (1968), referring to equicontinuity of the distributions of procedure in weak topologywithrespecttothesamplesize;ournotionisalsorelated,butnotidenticaltoapositivebreakdownpointforthe procedureonthisneighborhood:Notidentical, becausethere is no asymptotics in-volvedandthesamplesize is1!Hence,if wedefinedbreakdownpointastheinfimal radiusrsuchthattheprocedurebecomes unbounded ontherespectiveneighborhood,our procedureswouldat-tainbreakdown points arbitrarily close to1,which is not in the spirit of Hampel's original definition,conferHampel(1968)In the sequel, we present some of the existing approaches (and distinguishthemfrom ours),andreviewcertainideaswhichwewill exemplifywithcorrespondingreferences.Control Theoryhasfoundits ownwaytorobustness,somewhatdifferentfromthenotionusedin statistics; instead of formulating deviations from distributional assumptions, this approach ratheronly allows for bounded controls c.f. H /H?_ in order to cope with an incompletely specifiedtransferfunction.SurveyarticlesareBasarandBernhard(1991)andRoteaandKhargonekar(1995).Other authors rather understand robustness as stability w.r.t. disturbances in the parameterscf.Chenand Patton(1996).Judgedfromourperspective of Robustness,this is awkward:Forinstance only changing the parameters of a normal distribution will not lead us out of the class oflinear filters, hence w.r.t. the unboundedness of linear filters, the robustness problem persists- ingeneral,parametricneighborhoodsaresimplytoosmallto leadtorobustproceduresEarly approaches considered hard rejection schemes,cf.Meyrand Spies (1984)which howeverfromthepointofviewofTheorem3.3areclearlysuboptimal.A large stream of articles replaces normality assumptions by corresponding fat-tailed distribu-tions, notably t-distributions, cf. Meinhold and Singpurwalla (1989) but also ranges from BayesianapproachessuchasWest(1981,1984,1985),andalsocoversposterior-modeapproachesbyFahrmeirandKaufmann(1991),FahrmeirandKunstler(1999).The replacement of the ideal / central distribution could be seen as somewhat heuristical, replacingonly one distribution (the Gaussian one)by another one. Still, the resulting filters are highly robust,astheyyieldbounded (even redescending)filters.Theorem3.3 indicates however,that these dis-tributions might lead to overly pessimistic procedures, if the majority of the data is nearly normallydistributed; the argument of course also applies if the majority stems from another non-t- centraldistribution.AnothersetofpapersstartingwithAlspachandSorenson(1972),workswithmixtures,notablyofnormal distributions,in this casegiving the so-called Gaussian sumfilters.Originallydesigned tocovernon-Gaussianresp.nonlinearsituations,thisideahasalsobeenappliedtotacklerobustness2
Introduction contrary to other minimax settings, you will leave the house —even in the presence of ubiquitous dangers, simply because you only look at “realistic” dangers lying “close” to your intended way. The litmus test for our notion of robustness in this context will be whether a corresponding filter will be bounded in the observations, as otherwise the respective risk will be unbounded on an arbitrarily small neighborhood. This qualitative notion of robustness should be compared to “Qualitative Robustness” as introduced by Hampel (1968), referring to equicontinuity of the distributions of procedure in weak topology with respect to the sample size; our notion is also related, but not identical to a positive breakdown point for the procedure on this neighborhood: Not identical, because there is no asymptotics involved and the sample size is 1! Hence, if we defined breakdown point as the infimal radius r such that the procedure becomes unbounded on the respective neighborhood, our procedures would attain breakdown points arbitrarily close to 1, which is not in the spirit of Hampel’s original definition, confer Hampel (1968). In the sequel, we present some of the existing approaches (and distinguish them from ours), and review certain ideas which we will exemplify with corresponding references. Control Theory has found its own way to robustness, somewhat different from the notion used in statistics; instead of formulating deviations from distributional assumptions, this approach rather only allows for bounded controls —c.f. H∞/H2— in order to cope with an incompletely specified transfer function. Survey articles are Ba¸sar and Bernhard (1991) and Rotea and Khargonekar (1995). Other authors rather understand robustness as stability w.r.t. disturbances in the parameters, cf. Chen and Patton (1996). Judged from our perspective of Robustness, this is awkward: For instance only changing the parameters of a normal distribution will not lead us out of the class of linear filters, hence w.r.t. the unboundedness of linear filters, the robustness problem persists— in general, parametric neighborhoods are simply too small to lead to robust procedures. Early approaches considered hard rejection schemes, cf. Meyr and Spies (1984) which however from the point of view of Theorem 3.3 are clearly suboptimal. A large stream of articles replaces normality assumptions by corresponding fat-tailed distributions, notably t-distributions, cf. Meinhold and Singpurwalla (1989) but also ranges from Bayesian approaches such as West (1981, 1984, 1985), and also covers posterior-mode approaches by Fahrmeir and Kaufmann (1991), Fahrmeir and Künstler (1999). The replacement of the ideal / central distribution could be seen as somewhat heuristical, replacing only one distribution (the Gaussian one) by another one. Still, the resulting filters are highly robust, as they yield bounded (even redescending) filters. Theorem 3.3 indicates however, that these distributions might lead to overly pessimistic procedures, if the majority of the data is nearly normally distributed; the argument of course also applies if the majority stems from another non-t- central distribution. Another set of papers starting with Alspach and Sorenson (1972), works with mixtures, notably of normal distributions, in this case giving the so-called Gaussian sum filters. Originally designed to cover non-Gaussian resp. nonlinear situations, this idea has also been applied to tackle robustness 2

IntroductionissuesinErshoy(1978).ErshoyandLipster(1978).Kitagawa(1987).PenaandGuttman(1988)Asone may easily show in case of Gaussian mixtures however, the resulting filters are not bounded,hencenotrobustinoursense.Analogy of the state space model to linear regression models as noted by Duncan and Horn(1972) has led to approaches where people apply robust regression techniques to the filtering prob-lem,conferBonceletandDickinson(1983,1987),Boncelet(1985),CipraandRomera(1991).ThesameapproachledtotherlC,mlCfilter initiated byH.Riederand worked out inRuckdeschel (2oo1ch.3,4).Admittedly,the asymptotics under which thecorresponding robust regression estimatorsarederived is not available in our context; nevertheless these procedures compete well with otherrobustificationapproaches,compareRuckdeschel (2o01,ch.5).Although not bound to the structure of an SSM, the application of non-parametric median-typefilters has a long success story, in particular for signal extraction, starting with the 3R-smootherof Tukey (1977)a running median-—and much improved upon by the Dortmund group, usingseveral variants of repeated medians,conferFried etal.(2006),Fried etal.(2007),and Schettlingeret al. (2006), in particular with applications in intensive care medicine, confer Fried et al. (2000).These filters however do not use the state space model character of the data and have certainweaknessesinhigherdimensions,wherecorrespondingmedians are more difficultto defineandeven harderto implement if you want to go beyond coordinate-wise application of the repeatedmedians; see Fried et al. (2002), though.Withtheeverbecomingfastercomputers,andwiththerefined samplingtechniquesmeanwhileavailable, the use of many filters running in parallel has become increasingly attractive. Someapproaches in this setting do not use sampling but try to adaptively select the"optimal" filterin each time step t among a set N, filters considered at this time, confer, e.g. Pupeikis (1998). Astooperabilityofthesefilters,particularcaremustbespent onNt,conferinthis respectthefiltersproposedbySchick(1989)andBirmiwalandShen(1993).SamplingTechniquesinourcontextareverypromising as they allowto assess not only single aspects likeposteriormeanorposteriormode of ourfilters but aiso the wholeposteriordistribution. Some of these techniques proceednon-recursively,using Markov Chain Monte Carlo orthe Gibbs Sampleras in Carlin et al.(1992)andinCarterandKohn(1994,1996),whiletheParticleFilterapproachisrecursive;inparticulartheParticleFilter,compareFruhwirth-Schnatter(1994),Godsill andRayner(1998),HurzelerandKunsch(1998),Hurzeler(1998),Kunsch (2005),seemspromisingtogethandon exact idealposteriormeanneeded in Theorem 3.3.[MORECOMMENTS]Nearest to our approach are several articles concerned with minimax robustness in various spec-ifications.Wedonotdiscussparametricminimaxapproacheshere.ReferencesmaybefoundinRuckdeschel(2001,Sec.1.5).InthefrequencydomaintherearepapersbyKassamand Lim(1977),FrankeandPoor(1984)andFranke(1985).Onedisadvantage of this approach is that youhave to impose a uniform bound onthevariance as a bound forthe corresponding mass of the spectral measures in a neighborhood.Accordingtothetheoryof Wiener andKolmogorov,theoptimalfiltersfound inthiscontextareboundtobelinear,hencenotrobustinoursenseIn the time domain, the filter by Masreliez and Martin (1977), later termed ACM filter in Martin(1979),appeals to a minimax robustness which uses the asymptoticvariance and hence builds up3
Introduction issues in Ershov (1978), Ershov and Lipster (1978), Kitagawa (1987), Peña and Guttman (1988) As one may easily show in case of Gaussian mixtures however, the resulting filters are not bounded, hence not robust in our sense. Analogy of the state space model to linear regression models as noted by Duncan and Horn (1972) has led to approaches where people apply robust regression techniques to the filtering problem, confer Boncelet and Dickinson (1983, 1987), Boncelet (1985), Cipra and Romera (1991). The same approach led to the rIC, mIC filter initiated by H. Rieder and worked out in Ruckdeschel (2001, ch. 3,4). Admittedly, the asymptotics under which the corresponding robust regression estimators are derived is not available in our context; nevertheless these procedures compete well with other robustification approaches, compare Ruckdeschel (2001, ch. 5). Although not bound to the structure of an SSM, the application of non-parametric median-type filters has a long success story, in particular for signal extraction, starting with the 3R-smoother of Tukey (1977) —a running median— and much improved upon by the Dortmund group, using several variants of repeated medians, confer Fried et al. (2006), Fried et al. (2007), and Schettlinger et al. (2006), in particular with applications in intensive care medicine, confer Fried et al. (2000). These filters however do not use the state space model character of the data and have certain weaknesses in higher dimensions, where corresponding medians are more difficult to define and even harder to implement if you want to go beyond coordinate-wise application of the repeated medians; see Fried et al. (2002), though. With the ever becoming faster computers, and with the refined sampling techniques meanwhile available, the use of many filters running in parallel has become increasingly attractive. Some approaches in this setting do not use sampling but try to adaptively select the “optimal” filter in each time step t among a set Nt filters considered at this time, confer, e.g. Pupeikis (1998). As to operability of these filters, particular care must be spent on Nt , confer in this respect the filters proposed by Schick (1989) and Birmiwal and Shen (1993). Sampling Techniques in our context are very promising as they allow to assess not only single aspects like posterior mean or posterior mode of our filters but also the whole posterior distribution. Some of these techniques proceed non-recursively, using Markov Chain Monte Carlo or the Gibbs Sampler as in Carlin et al. (1992) and in Carter and Kohn (1994, 1996), while the Particle Filter approach is recursive; in particular the Particle Filter, compare Frühwirth-Schnatter (1994), Godsill and Rayner (1998), Hürzeler and Künsch (1998), Hürzeler (1998), Künsch (2005), seems promising to get hand on exact ideal posterior mean needed in Theorem 3.3. [MORE COMMENTS] Nearest to our approach are several articles concerned with minimax robustness in various specifications. We do not discuss parametric minimax approaches here. References may be found in Ruckdeschel (2001, Sec. 1.5). In the frequency domain there are papers by Kassam and Lim (1977), Franke and Poor (1984) and Franke (1985). One disadvantage of this approach is that you have to impose a uniform bound on the variance as a bound for the corresponding mass of the spectral measures in a neighborhood. According to the theory of Wiener and Kolmogorov, the optimal filters found in this context are bound to be linear, hence not robust in our sense. In the time domain, the filter by Masreliez and Martin (1977), later termed ACM filter in Martin (1979), appeals to a minimax robustness which uses the asymptotic variance and hence builds up 3

Introductionon Huber (1964,1981)1.This is somewhatproblematicastheasymptotics inthis non-stationarysettingwill never"kickin".Wewill insteadusetheSO-approachalreadyusedbyBirmiwalandShen(1993)andBirmiwal andPapantoni-Kazakos(1994),whoobtainsimilarresultsasoursalthoughinamorerestrictedsettingandwho,whenpassingbackfromthe"one-step-solution"tothedynamicmodel setting, proceed differently.1.2Organization ofthe rest of thepaperInsection2wepresentthegeneral setting,introducingthenecessarynotation.Passingfromthemostsimple,linear,time-discreteEuclideanstatespacemodel overtomoregeneral HiddenMarkovModels andDynamicBayesianModels,wealso introduceacontinuous-timesetup as it is relevantfor Mathematical Finance, and finally even allow for user-specified controls. All these increasinglymore complicated models presented in subsection 2.1 are covered by the optimality results wepresent, as long as mean squared error makes for a reasonable risk. In subsection 2.2, we thenpresentdifferenttypes ofoutliermodels relevantforthissettinganddiscusstheir implications.Afteran introductoryexampleinsubsection2.3introducingourreferencemodel,wefinallyreviewtheclassical Kalmanfilterwithitsoptimalityamongall linearfilters insubsection2.4,asthis(recursive)property will be the starting point for our robustification.This robustification, the rLS filter, is introduced in section 3. After its definition in subsection 3.1,extending a corresponding result from Ruckdeschel (2001,ch.8), we preliminarily drop all thedynamics of our model in subsection 3.2 and reduce it to a "Bayesian" type model. In this setting,we are able to show our central result, Theorem 3.3, which yields minimax-optimal solutions onSo neighborhoods in this quite general framework. Translating this result back into our dynamicmodel context is crucial and follows in subsection 3.3. In this setting, we disprove normality ofour filter in Proposition 3.5 and characterize linearity of the corresponding ideal conditional meaninProposition3.7.Withtheseresultsoptimalityof ourrLSfilterseemsout of reach.Extendingthe O neighborhoods a little, however,as done in subsection3.4,weneverthelessobtain a certainoptimalityfor the rLS in Theorem 3.11and Proposition 3.12.Finally,as to efficiency in computationalaspects we briefly mention stationarity properties of the rLS in subsection 3.5.Sections 4 and 5 contain recent results extending the setup of Ruckdeschel (2o01, ch. 8) to theiOsituationandsituationswherebothlO'sandAO'sarepresent.Thekeyideaistospecializeoul"Bayesian" model from subsection 3.2 to the additive model Y = X + and to use the symmetryof X andpresent inthis model:Weachievea translation of theoptimalityresultofTheorem3.3to a situationwith system-endogenous outliers wheretracking isthemaingoal. Section 5thenpresents a delayed hybrid filter which switches between AO-and IO-robust behavior according tothehistoryofwindowlengthwofthediscrepanciesofpredictedandrealizedobservations,hencegiving a filter that is simultaneously AO- and iO-robust.Section6illustrates ourfindings with simulationsatwhichwe evaluatetheclassical KalmanfiltertherLSvariantsrLS.AOfromsection3,rLS.IOfromSection4,and rLS.IOAOfromsection5to-gether with the competitors ACM from Masreliez and Martin (1977) and hybrpRM from Fried andIThe latter reference compiles some generalization of the former, which were already available to Martin and Masreliez.4
Introduction on Huber (1964, 1981) 1 . This is somewhat problematic as the asymptotics in this non-stationary setting will never “kick in”. We will instead use the SO-approach already used by Birmiwal and Shen (1993) and Birmiwal and Papantoni-Kazakos (1994), who obtain similar results as ours although in a more restricted setting and who, when passing back from the “one-step-solution” to the dynamic model setting, proceed differently. 1.2 Organization of the rest of the paper In section 2 we present the general setting, introducing the necessary notation. Passing from the most simple, linear, time-discrete Euclidean state space model over to more general Hidden Markov Models and Dynamic Bayesian Models, we also introduce a continuous-time setup as it is relevant for Mathematical Finance, and finally even allow for user-specified controls. All these increasingly more complicated models presented in subsection 2.1 are covered by the optimality results we present, as long as mean squared error makes for a reasonable risk. In subsection 2.2, we then present different types of outlier models relevant for this setting and discuss their implications. After an introductory example in subsection 2.3 introducing our reference model, we finally review the classical Kalman filter with its optimality among all linear filters in subsection 2.4, as this (recursive) property will be the starting point for our robustification. This robustification, the rLS filter, is introduced in section 3. After its definition in subsection 3.1, extending a corresponding result from Ruckdeschel (2001, ch. 8), we preliminarily drop all the dynamics of our model in subsection 3.2 and reduce it to a “Bayesian” type model. In this setting, we are able to show our central result, Theorem 3.3, which yields minimax-optimal solutions on SO neighborhoods in this quite general framework. Translating this result back into our dynamic model context is crucial and follows in subsection 3.3. In this setting, we disprove normality of our filter in Proposition 3.5 and characterize linearity of the corresponding ideal conditional mean in Proposition 3.7. With these results optimality of our rLS filter seems out of reach. Extending the SO neighborhoods a little, however, as done in subsection 3.4, we nevertheless obtain a certain optimality for the rLS in Theorem 3.11 and Proposition 3.12. Finally, as to efficiency in computational aspects we briefly mention stationarity properties of the rLS in subsection 3.5. Sections 4 and 5 contain recent results extending the setup of Ruckdeschel (2001, ch. 8) to the IO situation and situations where both IO’s and AO’s are present. The key idea is to specialize our “Bayesian” model from subsection 3.2 to the additive model Y = X + ε and to use the symmetry of X and ε present in this model: We achieve a translation of the optimality result of Theorem 3.3 to a situation with system-endogenous outliers where tracking is the main goal. Section 5 then presents a delayed hybrid filter which switches between AO- and IO-robust behavior according to the history of window length w of the discrepancies of predicted and realized observations, hence giving a filter that is simultaneously AO- and IO-robust. Section 6 illustrates our findings with simulations at which we evaluate the classical Kalman filter, the rLS variants rLS.AO from section 3, rLS.IO from Section 4, and rLS.IOAO from section 5 together with the competitors ACM from Masreliez and Martin (1977) and hybrPRMH from Fried and 1 The latter reference compiles some generalization of the former, which were already available to Martin and Masreliez. 4

General setupSchettlinger (2008),resp.Friedetal.(2006)Section7sketchesopenendsandstartingpointsforfurtherresearch,andsection8describesthestateofaffairsastoanimplementationofourproposalstoanRpackageTheproofstotheassertionsmadeinsections3-4arecompiled insection9Finally, in section 10 we summarize the findings of this article.2General setup2.1Ideal modelTo fix ideas, let us start with some definitions and assumptions. We are working in the context ofstate space models (SSM's) as to be found in many textbooks, confer Anderson and Moore (1979),Harvey(1991),Hamilton(1993),andDurbinandKoopman(2001)TimeDiscrete, linear Euclidean Setup:The most prominent setting in this context is the linear,time-discrete, Euclidean Setup where the unobservable p-dimensional state Xt evolves accordingtoapossiblytime-inhomogeneous VAR(1)modelwithinnovationsvtandtransitionmatricesFt(2.1)Xt=FfXt-1+UtThe statistician observes a q-dimensional linear transformation Yt of Xt where we incur an additionalobservation error et,(2.2) Yi= ZtXt+etIn the ideal model we work in a Gaussian context, that is we assume(2.3)Ut ndep Np(0, Qt),et ndep Ng(0, V),(2.4)(2.5)Xo ~ Np(ao,Qo),(2.6)[ut), (et),Xo indep.as processesAs usual, normality assumptions may be relaxed to working only with specified first and secondmoments, if we restrict ourselves to linear unbiased procedures as in the Gauss-Markov setting.For this paper, we assume the hyper-parameters Ft, Zt, Qt, Vt, ao to be known.5
General setup Schettlinger (2008), resp. Fried et al. (2006). Section 7 sketches open ends and starting points for further research, and section 8 describes the state of affairs as to an implementation of our proposals to an R package. The proofs to the assertions made in sections 3–4 are compiled in section 9. Finally, in section 10 we summarize the findings of this article. 2 General setup 2.1 Ideal model To fix ideas, let us start with some definitions and assumptions. We are working in the context of state space models (SSM’s) as to be found in many textbooks, confer Anderson and Moore (1979), Harvey (1991), Hamilton (1993), and Durbin and Koopman (2001). Time Discrete, linear Euclidean Setup: The most prominent setting in this context is the linear, time–discrete, Euclidean Setup where the unobservable p-dimensional state Xt evolves according to a possibly time-inhomogeneous VAR(1) model with innovations vt and transition matrices Ft . Xt = FtXt−1 + vt (2.1) The statistician observes a q-dimensional linear transformation Yt of Xt where we incur an additional observation error εt , Yt = ZtXt + εt (2.2) In the ideal model we work in a Gaussian context, that is we assume vt indep. ∼ Np(0, Qt), (2.3) εt indep. ∼ Nq(0, Vt), (2.4) X0 ∼ Np(a0, Q0), (2.5) {vt}, {εt}, X0 indep. as processes (2.6) As usual, normality assumptions may be relaxed to working only with specified first and second moments, if we restrict ourselves to linear unbiased procedures as in the Gauss-Markov setting. For this paper, we assume the hyper–parameters Ft , Zt , Qt , Vt , a0 to be known. 5