[00:00:10.23] this year's mary-jane Herald distinguished lecturer in the school of [00:00:15.08] [00:00:15.08] computer science it's really an honor for me to be just telling you a little [00:00:21.17] [00:00:21.17] bit about what this lecture is and Beck will introduce Mary Lou Safa but we have [00:00:28.03] [00:00:28.03] just the perfect person to be giving this lecture so many of you know that [00:00:33.02] [00:00:33.02] Mary Jean was a tremendous colleague she passed away pretty much early six years [00:00:39.08] [00:00:39.08] ago but is really put a lot of the I think really still contributes to the [00:00:48.05] [00:00:48.05] atmosphere in the school she was a consummate academic of the highest [00:00:53.23] [00:00:53.23] caliber I think you know in some measure is the third most prolific or a software [00:01:02.14] [00:01:02.14] engineer but she also just contributed with her heart and soul to diversity to [00:01:09.13] [00:01:09.13] service to every aspect of the academic community and I really think that that [00:01:14.17] [00:01:14.17] whole heard that which she embodied really lives on so I was looking at just [00:01:20.19] [00:01:20.19] telling you a few things about her she was an ACM and I Triple E hello [00:01:25.05] [00:01:25.05] she won many awards for mentoring including the Presidential Award for [00:01:30.23] [00:01:30.23] Excellence in stem mentoring which she accepted on behalf of the CRA W she was [00:01:36.12] [00:01:36.12] on the boards of the CRA and the CRA W and co-chair of CAW at one point the [00:01:43.00] [00:01:43.00] women's arm but you know beyond this she really changed the culture I think more [00:01:52.17] [00:01:52.17] broadly at Georgia Tech certainly in the school of computer science [00:01:55.17] [00:01:55.17] she was an advanced professor so the the idea of his lecture advanced [00:02:01.01] [00:02:01.01] in the school of computer science founded this to memorialize her spirit [00:02:05.19] [00:02:05.19] and to get somebody who really is a giant in in-service research and [00:02:13.04] [00:02:13.04] diversity and I can't think of anyone better than Mary Lou Safa who's Mary [00:02:18.03] [00:02:18.03] Jane's advisor they look alike only of words that I just mentioned Mary Lou is [00:02:26.06] [00:02:26.06] also Cristobal so I'm gonna let turn this over to get back to give you more [00:02:35.06] [00:02:35.06] details about thank you well it's a real privilege for the bus to have Mary Lou [00:02:45.18] [00:02:45.18] visit us today actually Dana already gave half a bio in a way that's many I [00:02:57.06] [00:02:57.06] started my career in programming languages compilers and software [00:03:00.23] [00:03:00.23] engineering every time I tackled a new problem [00:03:03.07] [00:03:03.07] I would buy Mary Lou and I really admired her taste in algorithm [00:03:09.23] [00:03:09.23] problem-solving and let's see many other things to add of course she's a [00:03:16.05] [00:03:16.05] distinguished faculty member at University of Virginia [00:03:19.01] [00:03:19.01] she's been department head there she was at University of Pittsburgh for a number [00:03:22.17] [00:03:22.17] of years but there's this discussion that we have been raising children about [00:03:28.12] [00:03:28.12] nature versus nurture you may have that same discussion about PhD students about [00:03:33.10] [00:03:33.10] department when advisor and you can just see in Mary Lou's case that how much the [00:03:41.20] [00:03:41.20] advisor can have an impact she had 32 I believe PhD students and half of whom [00:03:48.06] [00:03:48.06] are women and already successful and I just try to remember a few but I can [00:03:54.07] [00:03:54.07] show them any more I forget but Ravi Gupta was one of the ones [00:03:59.04] [00:03:59.04] I remember meeting very early in my career all the work on the program [00:04:03.00] [00:04:03.00] dependence graph a rossport ik whose leader in programming languages more [00:04:13.02] [00:04:13.02] recently Mon systems area Jason Mars time ever induce Tovar who was a [00:04:20.01] [00:04:20.01] colleague of mine at IBM and I'm sure there many many moons so uh impact has [00:04:25.17] [00:04:25.17] been tremendous and rather than take more time for talk please join me in [00:04:30.11] [00:04:30.11] welcoming so I'm very proud and really feel privileged to be able to present [00:04:50.17] [00:04:50.17] this Mary Jean Harris distinguished doctor as Dana said Mary Jane was my PhD [00:04:57.13] [00:04:57.13] student and she was also my dearest friend for a number of years now I know [00:05:04.23] [00:05:04.23] that you've had other speakers for this Mary Gennaro distinguished doctor and [00:05:09.18] [00:05:09.18] they've been esteemed researchers and an estate administrator but the research [00:05:17.03] [00:05:17.03] area is was not in Mary Jane's area and so went unlikely they would not be able [00:05:24.03] [00:05:24.03] to talk about her contributions and so today I want to talk about the evolution [00:05:32.23] [00:05:32.23] of suffering to hearing and talk about the Mary Jane's contribution and so the [00:05:40.15] [00:05:40.15] first thing I'm going to do is to present what I [00:05:44.18] [00:05:44.18] conceive as poor fundamental concepts in Suffern engineering that have developed [00:05:51.04] [00:05:51.04] over the years I call these fundamental concepts because they were used they [00:05:58.05] [00:05:58.05] were developed to solve problems in the past the challenges that we had under [00:06:02.18] [00:06:02.18] the past in software engineering and I believe a set of nice framework for [00:06:08.04] [00:06:08.04] trying to solve the problems that we have today in software testing and I'm [00:06:13.16] [00:06:13.16] going to do this through our history lens and along this history I will talk [00:06:18.14] [00:06:18.14] about Mary Jane's contributions to these fundamental concepts at the end I will [00:06:24.10] [00:06:24.10] also talk about these concepts that can be used in the challenges we have today [00:06:29.14] [00:06:29.14] in testing like machine language the machine learning autonomous cars cloud [00:06:36.04] [00:06:36.04] computing and so we have all kind of challenges that we have in software [00:06:40.18] [00:06:40.18] testing and I'll mention that it's at the end so it more than likely already [00:06:45.21] [00:06:45.21] know what software testing is which is to remind you it's an investigation and [00:06:50.17] [00:06:50.17] it's an investigation into a software product and the goal is to give the [00:06:58.02] [00:06:58.02] stakeholders some information about the quality of the application of the [00:07:03.20] [00:07:03.20] software like is robust does meet the requirements the Sauber industry uses [00:07:12.20] [00:07:12.20] as its primary way of ensuring that software behaves as we'd like it to [00:07:20.05] [00:07:20.05] behave and has the quality but soccer testing is expensive and in fact all the [00:07:29.14] [00:07:29.14] studies recently have shown that 80% of the cost in developing a piece of [00:07:35.05] [00:07:35.05] software comes from testing all right so we've been developing software for sixty [00:07:43.13] [00:07:43.13] years it doesn't seem that long to some of us but it has been sixty years and [00:07:48.09] [00:07:48.09] along that time there have been changes in our software and all of these changes [00:07:55.11] [00:07:55.11] have created disruptive disruptive shifts in software testing so new [00:08:02.07] [00:08:02.07] programming languages came about we have to test those software development [00:08:07.13] [00:08:07.13] environments came we have to be able to test those in different applications and [00:08:13.15] [00:08:13.15] so the challenges to testing all along these six these years have related to [00:08:20.05] [00:08:20.05] ensuring that the quality is what we expect in software and that the cost is [00:08:26.12] [00:08:26.12] reasonable in terms of the software development done now there was a time in [00:08:34.08] [00:08:34.08] these sixty years that we didn't have software testing and this was way back [00:08:41.05] [00:08:41.05] in the 60s the beginning of the 60s when Pro - go and we just were being [00:08:47.05] [00:08:47.05] developed and the first language commercial binding which to be developed [00:08:51.10] [00:08:51.10] for us Fortran and it was developed by IBM and during the 60s and the 70s [00:08:57.23] [00:08:57.23] testing was not done or envisioned for software and so people were only [00:09:04.21] [00:09:04.21] concerned with debugging the software so there wasn't a separate phase for [00:09:10.12] [00:09:10.12] the way to snow and so we were concerned about finding bugs through print [00:09:17.00] [00:09:17.00] statements through break statements and debugging tools and through memory dumps [00:09:23.01] [00:09:23.01] does anybody remember looking at memory don'ts to find yes okay that's that's [00:09:33.03] [00:09:33.03] what was and then about in the in the middle to seven days we talk about [00:09:39.07] [00:09:39.07] things that make air and in the air was the idea that there should be a separate [00:09:45.01] [00:09:45.01] verification and testing company and so we talked about that in a 1979 Benford [00:09:53.10] [00:09:53.10] Myers presented the first work on separating debugging from testing and he [00:10:00.15] [00:10:00.15] did that in a book called the art of something testing all right testing [00:10:07.07] [00:10:07.07] became continued to evolve in and became a critical component of the software in [00:10:14.18] [00:10:14.18] the 80s and 90s as it is today and over 35 years of research in software testing [00:10:22.04] [00:10:22.04] we've come up with new components we've come up with different types of testing [00:10:27.22] [00:10:27.22] we've come up with techniques for by testing it and so this is my present [00:10:34.22] [00:10:34.22] just some talk of work in testing and some of the topics in testing and if you [00:10:48.10] [00:10:48.10] put all the B's into a funnel and see what's there I believe not conjecture [00:10:55.11] [00:10:55.11] and forth under mental concepts and these concepts are coverage criteria [00:11:04.22] [00:11:05.01] regression testing input generation and test case prioritization and [00:11:12.13] [00:11:12.13] minimization and testing articles so for the rest of the talk I'm going to [00:11:19.14] [00:11:19.14] presented these concepts is what I call fundamental concepts tell you what they [00:11:24.17] [00:11:24.17] are explain you Mary Jane's contributions to [00:11:28.02] [00:11:28.02] these fundamental concepts and then talk about as I said and how we can use these [00:11:33.07] [00:11:33.07] fundamental concepts in the applications that we have today alright so let's talk [00:11:40.07] [00:11:40.07] about the first fundamental which is test coverage and what type test [00:11:45.06] [00:11:45.06] coverage is it's a metric that we use in software testing and it provides some [00:11:50.15] [00:11:50.15] kind of a measure about the amount of testing performed by a set of test cases [00:11:55.14] [00:11:55.14] and the coverage there are lots of different types of coverage it can be [00:12:00.21] [00:12:00.21] code coverage it could be auto coverage though any [00:12:06.04] [00:12:06.04] procedural synchronization and scenario and so on so an application will [00:12:13.18] [00:12:13.18] basically determine what kind of coverage so in the late 70s and the [00:12:19.11] [00:12:19.11] early 80s at that time we were using languages of CC plus plus in Ada and the [00:12:28.02] [00:12:28.02] concept of testing at that point was based on past statements or control flow [00:12:36.01] [00:12:36.01] the programs were getting larger and there were more complex control [00:12:42.15] [00:12:42.15] structures and data structures this is where we started having a good behind [00:12:47.18] [00:12:47.18] scriptures and the testers of the programmers are finding it hard to do [00:12:54.03] [00:12:54.03] comprehensive path testing they also recognize that it's not just to control [00:12:59.18] [00:12:59.18] that we want to test but it's the data the flow of the data [00:13:03.23] [00:13:03.23] how to stated flow from the country how does a value to flow from the definition [00:13:08.06] [00:13:08.06] to a use and so they were interested in data and so in the early eighties what [00:13:15.21] [00:13:15.21] emerged was simply search Monday to protest and what it was was a criteria [00:13:24.14] [00:13:24.21] sprouts developed a scheme and where the criteria was definitions definitions [00:13:30.05] [00:13:30.05] nuisance definitions and so on and so the technique was given a control flow [00:13:37.06] [00:13:37.06] graph and criteria find definitions uses definitions and so on and they give this [00:13:45.20] [00:13:45.20] so it's the first time David $4 introduced and it was done before the [00:13:50.03] [00:13:50.03] unit level but it didn't come out of and my research areas are besides software [00:13:59.21] [00:13:59.21] engineering programming languages and so it's dear to my heart and so I want to [00:14:05.08] [00:14:05.08] tell you a story about this so in the 1950s and early 60s portrait of IBM [00:14:13.23] [00:14:13.23] developed Fortran and they knew that they had to have a compiler to you to [00:14:19.14] [00:14:19.14] develop to translate this high-level language into machine language at that [00:14:25.09] [00:14:25.09] point in time people were just programming news [00:14:29.07] [00:14:29.07] some new language and so the compiler team went to work and developed [00:14:34.15] [00:14:34.15] compartment they the users and tried the compiler and decided that it would not [00:14:42.06] [00:14:42.06] work there was not going to be any high level languages because the compiled [00:14:46.18] [00:14:46.18] code was much worse than what they could do with the 70 language so supposedly [00:14:53.14] [00:14:53.14] there was a meeting and IBM and maybe they would know this more programmers [00:15:10.04] [00:15:10.04] won't use it we have to tweak we only have one choice which is to say I love [00:15:14.17] [00:15:14.17] languages are not going to be the future and we might as well forget about it and [00:15:19.22] [00:15:19.22] the only people that are gonna program are these experts it can develop [00:15:24.06] [00:15:24.06] assembly language and then somebody else said why don't we try to why don't we [00:15:31.14] [00:15:31.14] try to improve the good part and so with that developed the optimizations and so [00:15:38.09] [00:15:38.09] they applied optimizations to the compiler and then lo and behold the code [00:15:43.23] [00:15:43.23] that was generated was essentially equivalent to what you would code by him [00:15:49.01] [00:15:49.01] this was the first time control de flow was developed so the David plug was [00:15:55.15] [00:15:55.15] developed from programming languages compiler people and then in nineteen and [00:16:00.17] [00:16:00.17] this was in the 60s and the 80s is that accepted in software engineering [00:16:07.08] [00:16:07.08] okay uh the problem was here though that there were more challenges continued [00:16:13.17] [00:16:13.17] because this was done at the uniform and the appropriate languages and the [00:16:19.01] [00:16:19.01] applications we were writing for more complex they had more procedures [00:16:24.13] [00:16:24.13] functions were being used there were two large programs lots of modules and the [00:16:30.14] [00:16:30.14] games mode that we had was something for the unit and people can to recognize [00:16:35.07] [00:16:35.07] that integration is necessary that we need to test these interactions between [00:16:41.14] [00:16:41.14] the procedure calls and the procedure of attorneys and the technique the [00:16:47.12] [00:16:47.12] challenge to do this was that they technique had to be context sensitive [00:16:52.00] [00:16:52.00] you have to make sure you return to the point where the call was and you have to [00:16:57.07] [00:16:57.07] track means you have to track means across procedure calls from actual [00:17:01.13] [00:17:01.13] terminals and performance to actuals and at that point we had languages unlike [00:17:07.07] [00:17:07.07] Fortran that had occurred so we had to deal with Pickers all right so at that [00:17:13.16] [00:17:13.16] point Mary Jane and her colleagues developed the inter procedure [00:17:17.11] [00:17:17.11] data and what it did was to extend the data flow testing across procedure [00:17:23.14] [00:17:23.14] colors and so we could then do integration testing by using data flow [00:17:30.01] [00:17:30.01] and so we looked at Danube variables a bunch of call or return site actual [00:17:36.04] [00:17:36.04] printers meet your cosine and [00:17:40.20] [00:17:42.19] so here's the time and I don't have a lot of time to go through the details of [00:17:47.16] [00:17:47.16] supplement but here's an example program where you have two parameters that's [00:17:54.05] [00:17:54.05] excellent and it's passing boat X X to paint one which is y and t and y and z [00:18:01.02] [00:18:01.02] comes B to Lindsey and you can see then that all of these names have to change [00:18:07.14] [00:18:07.14] and have to be brought back and so what I'm Mary Jean BIA and her colleagues [00:18:15.17] [00:18:18.04] interprocedural flow graph and with this graph we identified locations and actual [00:18:25.21] [00:18:25.21] definitions and uses and it was used to guide the selection so this again is the [00:18:31.22] [00:18:31.22] example that we've had the names associated and then through these graphs [00:18:39.16] [00:18:39.16] then they control or new data flow was propagated and then that meant we can [00:18:46.08] [00:18:46.08] then determine whether or not very goal in one procedure was used or value used [00:18:52.02] [00:18:52.02] in one procedure was used in another and so on [00:18:56.13] [00:18:56.13] okay so challenges kept coming and in the in nine days we had a new life [00:19:05.03] [00:19:05.03] switch right we had a job now Java was not the first C object-oriented language [00:19:11.06] [00:19:11.06] what was the first [00:19:14.15] [00:19:25.21] so simular mr. feller but Java was essentially commercialized in the 90s [00:19:32.10] [00:19:32.10] and in Java yeah in Java for any object ornament you have entered interim ethics [00:19:40.05] [00:19:40.05] that were called within event that intermittency recovered within a class [00:19:44.09] [00:19:44.09] but between methods and intra class and it was the inter class that was causing [00:19:49.22] [00:19:49.22] problems interesting the other two would be tested normally but the inter class [00:19:57.05] [00:19:57.05] you have public class methods and what is outside the class and they were doing [00:20:05.17] [00:20:05.17] it randomly and so there was no sequential trace did you think go [00:20:11.12] [00:20:11.12] through and see how the flow goes and so you can't use the control flow and you [00:20:19.03] [00:20:19.03] have to do some kind of calling sequence and so again their junior colleagues [00:20:25.19] [00:20:25.19] were the first to develop a comprehensive data flow testing of [00:20:31.09] [00:20:31.09] object-oriented languages and then what they did was to consider all the actions [00:20:37.01] [00:20:37.01] all the interactions of all possible sequences but method calls and so they [00:20:42.20] [00:20:42.20] use the simulator to do this and essentially developed a class control [00:20:51.21] [00:20:51.21] all right but challenges kept company and here the challenge of the program's [00:20:58.13] [00:20:58.13] growing in the 90s we're growing in size and complexity changes were being made [00:21:05.11] [00:21:05.11] now because the applications were so large and being used for such a long [00:21:09.05] [00:21:09.05] time there were bug fixes there were attempts to add more functionality and [00:21:15.06] [00:21:15.06] enhancements and configurations and so anytime it changes may need what [00:21:21.18] [00:21:21.18] happened was that all the test cases that had been used previously to test [00:21:26.16] [00:21:26.16] the application had to be so they were retesting everything they were running [00:21:32.08] [00:21:32.08] everything and we call they put these test gates please coming to the test [00:21:37.19] [00:21:37.19] suite and these read tests were taking all day and sometimes weeks and so it [00:21:44.10] [00:21:44.10] was just too much and so next fundamental concept came to solve this [00:21:51.15] [00:21:51.15] problem and it was regression testing and in regression testing we perform [00:21:59.13] [00:21:59.13] analysis on modified software and we want to make sure that this modified [00:22:05.08] [00:22:05.08] software behaves correctly and it's not adversely impact the original [00:22:13.13] [00:22:15.16] and the changes could be either code or start drawing and what the regression [00:22:22.06] [00:22:22.06] testing had to do is to bet identify change code and check how it impacts [00:22:27.22] [00:22:27.22] execution and then identify test cases that test the changes and the impact and [00:22:38.10] [00:22:38.10] then the second thing is if you had if you add a new path then you had to have [00:22:43.18] [00:22:43.18] more test cases and so here's a very brief description of this probity and [00:22:51.07] [00:22:51.07] you change into program P Prime and you had a test suite to test and then you [00:22:59.08] [00:22:59.08] take that test suite and you subtract those tests that you needed to rerun and [00:23:06.09] [00:23:06.09] you reprimanded ap versions just to keep on okay and here's an example where you [00:23:17.11] [00:23:17.11] can see here it quickly here's a control flow graph the program changed s5a here [00:23:24.21] [00:23:24.21] and we can beat this you know here we can see [00:23:28.05] [00:23:28.05] to run deep a test2 and test3 so the only test that we need one on this [00:23:35.09] [00:23:35.09] change program our T 2 and T 3 they did a study and showed that in fact the [00:23:47.12] [00:23:47.12] number of test cases that you need to run after the change code was reduced [00:23:52.16] [00:23:52.16] considerably depending on this and so the selectivity depends upon where the [00:24:02.02] [00:24:02.02] changes were made or the structural changes where they cover changes how [00:24:07.08] [00:24:07.08] many times do you do regression testing and what's the cover there was lots of [00:24:13.01] [00:24:13.01] research on this topic that people here did more research and in fact this [00:24:19.22] [00:24:19.22] became the base of impact analysis which people used to do all right [00:24:27.09] [00:24:27.09] another challenge I talked about the test suite and the fact that the [00:24:32.21] [00:24:32.21] programs the test cases were put in the test suite and then they were rerun and [00:24:38.11] [00:24:38.11] regression tests helped Ivan dominate some of the tests that had to be rerun [00:24:43.12] [00:24:43.12] but I missed a conference one time and somebody told me the real problem is a [00:24:48.14] [00:24:48.14] test case suite it's getting way too much and so even with regression testing [00:24:55.17] [00:24:55.17] can nothing to do with the size of the test case but only did the executioner [00:25:01.05] [00:25:01.05] and so whoever's at that point where I'm willing to throw any test case away so [00:25:07.17] [00:25:07.17] you talk to them in companies and they would say oh we can't throw anything [00:25:11.18] [00:25:11.18] away but the fact what's happening with these test Suites [00:25:16.08] [00:25:16.08] is that they were obsolete test cases which were used to something that no [00:25:22.06] [00:25:22.06] longer tested it or they'd be redundant and so Mary Jane and her colleagues [00:25:32.08] [00:25:32.08] developed something called test suite minimization and the problem was given a [00:25:38.23] [00:25:38.23] set of requirements and a set of test cases select the minimum number of test [00:25:45.08] [00:25:45.08] cases that you need to meet all the requirements right and this is the first [00:25:51.13] [00:25:51.13] one in the first paper areas to explore this a challenge of Texas wheat [00:25:56.06] [00:25:56.06] production and since they're multiple now if you think about this problem I've [00:26:03.15] [00:26:03.15] given a set of requirements in the set of test cases how do you find the [00:26:07.00] [00:26:07.00] minimum it is really converts to the hitting set problem in complexity theory [00:26:11.21] [00:26:11.21] and so it is in fact finding and and so what we did was to come up [00:26:21.14] [00:26:21.14] with a heuristic of course to self is problem and I'm not going to go through [00:26:25.17] [00:26:25.17] all the details for the heuristic but essentially says take a test case that's [00:26:31.19] [00:26:31.19] essentially there's only one test case that solves and then from then on [00:26:40.18] [00:26:40.18] selective's cases that that satisfied the maximum number of and so here's an [00:26:49.17] [00:26:49.17] example where you have these requirements in these test cases and you [00:26:56.07] [00:26:56.07] can see up here that t5 is the only test case it's salt Department tune so we [00:27:03.07] [00:27:03.07] pick it but it also solves for tournament one so they pick it and then [00:27:08.04] [00:27:08.04] it you look at the rest you'll see if t1 appears acquirements and then keep [00:27:17.04] [00:27:17.04] trading you have to see all right so we get experimental studies and found that [00:27:23.21] [00:27:23.21] if you were doing this during the front-end development you could reduce [00:27:27.15] [00:27:27.15] the test suite by 18 to 60 percent and if you made some changes we could reduce [00:27:34.14] [00:27:34.14] it 20 [00:27:37.16] [00:27:39.20] right now so what do we have we have a technique that says make a change [00:27:46.23] [00:27:46.23] these are the test cases you should run in 91 we have a technique that says you [00:27:53.18] [00:27:53.18] can reduce the size of your intensity and now we have the prioritization and [00:28:00.06] [00:28:00.06] what this means is that when you do regression testing you're gonna get a [00:28:04.17] [00:28:04.17] list of test cases do they have to be run and there may be a best quarter and [00:28:11.01] [00:28:11.01] so this best is firework ization that gives the better order for example I get [00:28:18.20] [00:28:18.20] some work on the companies would say you only have six hours to run this program [00:28:25.02] [00:28:25.02] and so we can do prioritization by how many tests okay just come in run six [00:28:31.12] [00:28:31.12] hours I mean I can get work on fault-finding and I think Alex did [00:28:36.03] [00:28:36.03] public detection statement initial and so on alright so one of the problems [00:28:44.03] [00:28:44.03] that occurs today I've been very glib about here's a criteria definition use [00:28:50.16] [00:28:50.16] here's know but how do you generate a test case that goes through that [00:28:55.17] [00:28:55.17] definition and how do you generate a desiccant test case it goes no it's very [00:29:01.00] [00:29:01.00] difficult and there have been lots of work since 2000 on this is still a very [00:29:09.07] [00:29:09.07] popular area because we haven't really solved this problem very well and again [00:29:14.20] [00:29:14.20] Mary Jean and her colleagues do used way back in 1999 use the genetic [00:29:21.03] [00:29:21.03] - in fact my test cases alright so now we move to another challenge and this [00:29:30.23] [00:29:30.23] challenge is how do we know the results our test case and typically in a if [00:29:41.22] [00:29:41.22] we're doing formal notation we use but how do you generate most of the Oracles [00:29:51.01] [00:29:51.01] we've developed are a bit man like somebody figures out the answer and [00:29:56.23] [00:29:56.23] sometimes we use that at work which are based on the domain of an application [00:30:03.03] [00:30:03.03] that just says if we expect this kind of answer now these Oracle's were of all [00:30:11.17] [00:30:11.17] the fundamentals for the least researched in at times today they're [00:30:19.10] [00:30:19.10] becoming more important and there's more work being done and the reason is we [00:30:27.03] [00:30:27.03] don't know what a machine learning algorithm spits out an answer whether [00:30:31.20] [00:30:31.20] it's right and how do you know when you're driving a car that everything is [00:30:38.21] [00:30:38.21] going well and I'll talk to them more about them right so this is a fourth [00:30:45.06] [00:30:45.06] fundamental test oracles this is one fundamental the dirt that Mary Jean did [00:30:51.11] [00:30:51.11] not remember and that's not surprising because none of us worked on the [00:30:56.09] [00:30:56.09] particles at that point and so an Oracle is a partial function from sequence of [00:31:03.09] [00:31:03.09] stimuli which is what's the input to observable response what's the help [00:31:10.07] [00:31:10.07] and typically oracle has data and some kind of procedure for computing the [00:31:17.12] [00:31:17.12] right answer but again not as much research so there's not much that we [00:31:23.20] [00:31:23.20] could talk about just recently and it's changing today because of the [00:31:29.17] [00:31:29.17] applications all right so that's kind of the instrument [00:31:34.14] [00:31:34.14] what challenges we've had and how we resolve this some of that is challenged [00:31:39.22] [00:31:39.22] and how these fundamental concepts were developed to meet needs challenges and [00:31:47.07] [00:31:47.07] so now we're going to go to the today's to the testing challenges and I said in [00:31:55.21] [00:31:55.21] the beginning that software testing is disrupted by new programming languages [00:32:02.12] [00:32:02.12] and have we had any new programming glaciers recently no it used to be that [00:32:10.09] [00:32:10.09] we can say a new major language was developed every decade right [00:32:17.08] [00:32:17.08] so in the 7.68 for him in the 70s we had city and 80s we had I see passports [00:32:28.21] [00:32:28.21] in the 90s we have childhood in 2000's high thoughts an old language [00:32:38.04] [00:32:38.04] I taught a course on web languages in Python is start in 1978 so we're using [00:32:45.23] [00:32:45.23] it but it's an old language and there have been some scripting languages [00:32:49.23] [00:32:49.23] develop they're not really major ok so we don't have to breathe right now about [00:32:57.18] [00:32:57.18] programming languages and you can argue with me that we have to worry about [00:33:02.08] [00:33:02.08] testing what about applications well we have tile computing we have machine [00:33:14.04] [00:33:14.04] learning an autonomous computing components how do you test end-to-end [00:33:20.23] [00:33:20.23] and then another and I'm going to talk about the first three and the last one [00:33:27.09] [00:33:27.09] machine components this is a issue that we have and that you have a number of [00:33:32.04] [00:33:32.04] components in the piece of software like and map program and one of them at the [00:33:37.10] [00:33:37.10] end is machine and so how do you test that when you're using the same model [00:33:43.21] [00:33:43.21] for the machine learning so that's a big question right now but that's not one [00:33:48.18] [00:33:48.18] point to address all right so what I want to talk about is how do we design [00:33:54.23] [00:33:54.23] and implement test cell for testing onions [00:34:00.05] [00:34:00.05] applications and again I can check conjecture that we should use the [00:34:06.05] [00:34:06.05] fundamentals that we've developed over the years to induce guide us into hot [00:34:11.16] [00:34:11.16] intensities all right so now when you put an application in the cloud you want [00:34:21.00] [00:34:21.00] to know how long it's going to take to run is it going to have a response and [00:34:27.05] [00:34:27.05] and behavior is going to be determined by the traffic conditions by the load [00:34:37.13] [00:34:37.13] balance how many applications are running in a virtual machine [00:34:42.16] [00:34:42.16] stress testing meaning is there a lot of requests banging on an application how [00:34:49.00] [00:34:49.00] do we know that we're able to handle it and so in kind of applications in [00:34:54.12] [00:34:54.12] testing there were mostly conservative is performance and security so in [00:35:04.12] [00:35:04.12] performance I want to know how does my program perform in the cloud the [00:35:11.20] [00:35:11.20] challenges are that we have multi-tenancy we don't know what [00:35:16.18] [00:35:16.18] applications are running with our application on a virtual machine there [00:35:21.10] [00:35:21.10] could be none there could be 20 we don't know that [00:35:24.12] [00:35:24.12] I think I'm scheduling the cloud services do the damn scheduling they [00:35:31.20] [00:35:31.20] don't tell us something to do it and so we don't know [00:35:35.07] [00:35:35.07] and actual coverage and I'm going to talk about some work we've recently done [00:35:48.22] [00:35:48.22] at Virginia on performance regression testing so how do you think about it if [00:35:57.06] [00:35:57.06] you make a change in the in the cloud environment how do you retest your [00:36:03.09] [00:36:03.09] application if you make a change in the hardware if the service people make a [00:36:09.18] [00:36:09.18] charge and not how do you test it and so it really hasn't been very much work [00:36:14.10] [00:36:14.10] done in regression testing for destiny except loads in other words you can [00:36:20.23] [00:36:20.23] change the size of the VM or you could change the number of the test cases [00:36:30.07] [00:36:30.07] this issue has been we don't have a suspect benchmark that we that everybody [00:36:39.04] [00:36:39.04] uses to figure out whether your answers are right testicles all right so we [00:36:50.04] [00:36:50.04] started a research project in Virginia and what we wanted to do was to say the [00:36:57.13] [00:36:57.13] user I want to port my application into sloughed I don't know how much it's [00:37:03.09] [00:37:03.09] going to cost I don't know how many bees if the more videos I use the higher the [00:37:09.03] [00:37:09.03] cost and larger the size of the VMS determines the cost I don't know any of [00:37:15.20] [00:37:15.20] them and the cloud does have less tissa tea [00:37:21.01] [00:37:21.01] which means that they can automatically change the system could automatically [00:37:25.08] [00:37:25.08] change the number of VMs of the size of Mia's but that takes how much you're [00:37:32.17] [00:37:32.17] gonna pay for and so what we wanted to do was to say we're going to tell the [00:37:37.20] [00:37:37.20] users we're going to predict given an application how many things they need to [00:37:44.05] [00:37:44.05] use to get the response time in the community now the first problem is [00:37:50.18] [00:37:50.18] what's the coverage we don't have code [00:37:58.20] [00:37:59.19] so what's the coverage in the coverage you could think about as the days of the [00:38:06.12] [00:38:06.12] week or the time of the day or the location of the server so these are the [00:38:13.09] [00:38:13.09] things that if you're running an application and you want to make sure [00:38:17.11] [00:38:17.11] that you run it to the end you have to make sure that there's no change across [00:38:23.12] [00:38:23.12] whatever you run the days of the week as and we can consider it occasionally but [00:38:29.09] [00:38:29.09] just consider time of day and days a week and so the challenge in order for [00:38:35.07] [00:38:35.07] us to solve that first problem was when do we stop running the application so [00:38:41.13] [00:38:41.13] that we can determine how it behaved how many vamps what was the cost and there [00:38:48.07] [00:38:48.07] was no work done and a difficulty in getting accurate [00:38:53.11] [00:38:53.11] performance it is that if you run an application at one point this could be [00:38:59.01] [00:38:59.01] the distribution and if you run it again that could be the distribution and you [00:39:06.07] [00:39:06.07] see that they're very much different and so which one do you use and so what we [00:39:13.14] [00:39:13.14] wanted to do was to determine when does this application stabilize [00:39:18.05] [00:39:18.05] when can we stop run and get insane we can now examine this application so when [00:39:23.19] [00:39:23.19] does it stabilize and again the problem is we have to run it so that we're sure [00:39:29.07] [00:39:29.07] that we're testing multi-tenancy and that we're testing different kinds of [00:39:35.21] [00:39:35.21] scheduling and we have to make sure that it doesn't cost I mean I can test a [00:39:43.07] [00:39:43.07] cloud application by just running it but it's going to be bit expensive and so [00:39:48.22] [00:39:48.22] what we did was develop a tool called PQ for photo and what it does it determines [00:39:56.23] [00:39:56.23] the stopping condition using statistics and it stops the test once the results [00:40:05.08] [00:40:05.08] are statistically stable and these results all content [00:40:11.20] [00:40:11.20] based on the week but we've also done based on a week based on the time of the [00:40:17.17] [00:40:17.17] day and based on holidays we want to get high accuracy and of course no testing [00:40:25.10] [00:40:25.10] cost here's the way that works we put the application on the cloud we have to [00:40:35.12] [00:40:35.12] keep the application for one week and we generate the distribution execute the [00:40:40.14] [00:40:40.14] application for another week and we combine the two distributions and then [00:40:45.21] [00:40:45.21] we update the distribution by combining three and then we check our d1 and d2 [00:40:52.02] [00:40:52.02] stable so this is something it's a Disick's called the bootstrapping [00:40:57.08] [00:40:57.08] assessment system [00:41:00.09] [00:41:01.12] and we do this kind of these statistical equations to find out whatever we have [00:41:08.11] [00:41:08.11] stability and you can see here now we run and we can stop because it's pretty [00:41:17.03] [00:41:17.03] both of them are pretty stable the distributions are stable and so some of [00:41:23.15] [00:41:23.15] this work says that we know how to have stopping conditions and the accuracy is [00:41:29.21] [00:41:29.21] about 95 24% and we ran it with ground troops to make sure machine now moved on [00:41:45.02] [00:41:45.02] suburban autonomous computing and I might have time to talk together but I [00:41:51.21] [00:41:51.21] wanted to talk about the machinery there's a lot of interest now in testing [00:41:56.07] [00:41:56.07] machine learning because there's so much and so how do you test machine and [00:42:03.23] [00:42:03.23] people came up with papers nothing has really been accepted they just depend on [00:42:11.19] [00:42:11.19] the accuracy [00:42:14.08] [00:42:15.22] and so let's look at this machine learning in terms of the funding levels [00:42:20.13] [00:42:20.13] for coverage aggression SKT selection and it works now this is a different [00:42:29.22] [00:42:29.22] kind of software then we've been talking about all of the other times made up [00:42:36.22] [00:42:36.22] throughout history so far I've talked about has been covered related machine [00:42:42.16] [00:42:42.16] learning is not coordinated it's dated and so the code is some just [00:42:49.14] [00:42:49.14] multiplication Kure multiplying the race so how do you test them now [00:42:57.20] [00:42:57.20] for coverage is there and so what we're asking here is there a systematic way to [00:43:05.01] [00:43:05.01] test machine learning using some kind of coverage and the problem with coverage [00:43:18.11] [00:43:18.11] from to morning it's sometimes the techniques are too simple and sometimes [00:43:24.10] [00:43:24.10] the cut is computationally inches so there in people that have developed [00:43:30.23] [00:43:30.23] testing and I think in Kappa this past month there was deep tested it's cold [00:43:38.01] [00:43:38.01] and what they did was all they did was to make sure that the number of neurons [00:43:44.07] [00:43:44.07] above when they fired another threshold that they counted that note is being [00:43:49.18] [00:43:49.18] covered so they wanted to cover on the nose in the layers that turned out to be [00:43:57.12] [00:43:57.12] too simple and people could get them covered without very many test cases so [00:44:02.23] [00:44:02.23] we're still working on this problem and people thought about instead of using [00:44:09.17] [00:44:09.17] single nodes using sequences of nodes there's a paper recently about using a [00:44:18.18] [00:44:18.18] thing from software testing mc/dc which is to test all combinations of decisions [00:44:23.21] [00:44:23.21] and again there's more work so how do we come [00:44:29.16] [00:44:30.12] another problem machine learning is regression testing alright so we know [00:44:38.07] [00:44:38.07] that in machine learning through training with data and that data from [00:44:43.20] [00:44:43.20] the Train generates a model which one we use for induction inference and so now [00:44:52.16] [00:44:52.16] the problem is you have more data company so what do you do [00:44:57.22] [00:44:57.22] you can't keep retraining but your model may not be up to date with the data [00:45:06.00] [00:45:06.00] that's coming in and so how do we how do we know when to stop and say stop you [00:45:13.14] [00:45:13.14] need to retrain your model all right we don't know how people [00:45:17.19] [00:45:17.19] talked about it and this is what I've seen in papers you save the new data you [00:45:23.05] [00:45:23.05] test the accuracy of what you're getting because of them quite good you'll be [00:45:27.16] [00:45:27.16] training and that's in that's as much as we know so we have to work on that and [00:45:35.22] [00:45:35.22] you to see the accuracy of your model degrading and Trust training new models [00:45:39.14] [00:45:39.14] in I we have the PhD students [00:45:44.18] [00:45:48.04] okay related to that is horrible you know that if you run the same data on [00:45:55.03] [00:45:55.03] the same day that was trained on the same house so how do you know the [00:46:03.05] [00:46:03.05] answers are okay we don't have a form that tells us whether this is this way [00:46:08.20] [00:46:08.20] and we've done experiments showing that that in fact it's true [00:46:12.22] [00:46:12.22] are they right its accuracy maybe once 98 percent maybe once no means exclusive [00:46:18.11] [00:46:18.11] is that okay we don't know and that's part of the problem using machine [00:46:23.07] [00:46:23.07] learning in critical systems is because we don't know for sure what the answer [00:46:28.16] [00:46:28.16] is [00:46:30.22] [00:46:30.23] okay there have has been some work a [00:46:35.06] [00:46:35.06] metamorphic Oracle's and they're trying to show that some data input and it's [00:46:49.06] [00:46:49.06] similar to what the Alpha trend data if the output should be should be similar [00:46:53.21] [00:46:53.21] there's also an interesting paper that just breathes in it called surprise [00:46:57.14] [00:46:57.14] adequacy and what they do there is they say you look at your answers and if you [00:47:07.04] [00:47:07.04] get a surprise and you know it's a really good paper it's a really good not [00:47:11.15] [00:47:11.15] making fun of it it's good we're working on that paper if you surprised with the [00:47:16.12] [00:47:16.12] answers from the training data you know that it's not right [00:47:21.00] [00:47:21.00] right so you have to do something else so I think this is a good and I say it's [00:47:25.23] [00:47:25.23] I I don't like the word surprise because it does [00:47:29.18] [00:47:29.18] give you any kind of technical information okay all right so let me [00:47:43.09] [00:47:43.09] just say little bit about this testing autonomous vehicles and so again we have [00:47:49.05] [00:47:49.05] coverage what type of if you want if you have [00:47:52.02] [00:47:52.02] autonomous vehicle and you're running it what's the coverage how many scenarios [00:47:58.18] [00:47:58.18] do you do you need one determines of just we don't know regression testing [00:48:06.14] [00:48:06.14] you have your test run and then you change your cars they look a nutrient [00:48:13.06] [00:48:13.06] what do you do test cases how do you get scenarios and how to judge is driving in [00:48:22.14] [00:48:22.14] a scenario and that's kind of what we've looked at now and what you don't is [00:48:28.04] [00:48:28.04] saying that you can simulate an autonomous vehicle and get a log of its [00:48:34.22] [00:48:34.22] performance and then we're going to look at that dog in terms of traffic laws [00:48:41.22] [00:48:41.22] safety recommendations passenger comfort and see if we can give a score to that [00:48:47.21] [00:48:47.21] block where you start those things and we looked at some yeah we can't get any [00:48:54.17] [00:48:54.17] articles from companies they're not gonna do my supporters but we looked at [00:48:59.03] [00:48:59.03] some that one is from audacity from Karla and some other places and we [00:49:05.19] [00:49:05.19] what we've done in this work is to develop a programming language that we [00:49:11.05] [00:49:11.05] can code all of these things into the programming language and then created [00:49:15.12] [00:49:15.12] work and if you look at that picture that's what we ran and what it's showing [00:49:20.05] [00:49:20.05] simply is that the Oracles rate the cars all over the place somebody some work or [00:49:28.18] [00:49:28.18] rated good some work will rate it not so good and so we really need to work on [00:49:35.08] [00:49:35.08] this work and this is just beginning with us and the one of the questions we [00:49:41.09] [00:49:41.09] have is conflicts and all of these rules are there any contracts and so we hope [00:49:47.10] [00:49:47.10] to be able to answer it ok all right now here's an example of using the [00:49:57.14] [00:49:57.14] fundamentals I've talked about it's raining and you can sit there inside the [00:50:01.09] [00:50:01.09] area but we have an example a piece of software and we want to know how did the [00:50:11.08] [00:50:11.08] company handle these phases right and so what application are we going to look at [00:50:18.21] [00:50:19.05] the Boeing 787 mix alright so we know that this 737 max was part of its whole [00:50:31.23] [00:50:31.23] series of Boeing 737s and they've been one of the most reliable airplanes along [00:50:38.20] [00:50:38.20] the chain once they've been one of the most [00:50:41.11] [00:50:41.11] reliable airplanes ever built and they essentially but we kept saying for all [00:50:48.12] [00:50:48.12] of these plans they essentially have the same heart making software so therefore [00:50:52.23] [00:50:52.23] we don't have to certify plane and therefore we don't have to test much [00:50:56.20] [00:50:56.20] because we've already done it and the Boeing 737 max was developed to save [00:51:05.02] [00:51:05.02] fuel the fuel cost and they were in competition with the fear of us which [00:51:09.22] [00:51:09.22] was also developing a plane at that time that was going to be a fuel efficient [00:51:15.03] [00:51:15.03] and so what could Boeing do you know that if you make your engines larger you [00:51:22.23] [00:51:22.23] get more fuel efficient and so they made their injured larger but the problem [00:51:29.10] [00:51:29.10] that happened is it was so large it couldn't fit right here on the wing [00:51:35.02] [00:51:35.02] where it usually fits because it wasn't enough fruit so what did they do they [00:51:40.07] [00:51:40.07] moved it larger they moved it higher and positioned it more toward the front of [00:51:47.00] [00:51:47.00] the plane say they moved by moving that engine it shifted the center of gravity [00:51:54.08] [00:51:54.08] and it increased the nose potential for the nose to pull up because you can see [00:51:59.16] [00:51:59.16] where if your center of gravity gives this way the nose keeps going up it's [00:52:04.12] [00:52:04.12] going to stall the planes going to stall instead of prevent it from stalling they [00:52:09.09] [00:52:09.09] installed a new suffering package and this package was the MCATs [00:52:16.07] [00:52:16.18] now let's think of how they tested this it's usually these fundamentalists right [00:52:25.06] [00:52:25.06] coverage what features did they test they may have cows tested the features [00:52:32.09] [00:52:32.09] but there's some work saying they didn't test the interaction of the features [00:52:39.05] [00:52:39.05] enough they didn't test an EM cast with the engine and the other parts they did [00:52:45.17] [00:52:45.17] tessick interactions regression testing what they assumed which they've always [00:52:52.17] [00:52:52.17] done is that the plane is the same so they can have to worry too much about [00:52:57.10] [00:52:57.10] testing because it's the same thing and bowing at that point started lying off [00:53:04.04] [00:53:04.04] senior engineers because they said we don't need you our products are mature [00:53:09.12] [00:53:09.12] and so they laid off lots and they outsource the programming and then [00:53:16.20] [00:53:16.20] testing to workers earning [00:53:21.13] [00:53:24.23] an update in order to do regression testing they're recruiting pilots to [00:53:32.06] [00:53:32.06] test the software changes and the FAA chief is going to test the software [00:53:39.14] [00:53:39.14] changes now is this what we would expect in testing a critical system and again [00:53:49.09] [00:53:49.09] test cases they don't talk about test cases they're recruiting pilots to [00:53:54.08] [00:53:54.08] generate these test cases they're requiring Derek wanting to FAA chief to [00:53:59.03] [00:53:59.03] do it Oracle's crashes are not good that's the only Oracle they have and [00:54:08.06] [00:54:08.06] perhaps using inexperienced software developers and software testers it's not [00:54:15.14] [00:54:15.14] a good idea so all of us debris testers do want to make more than [00:54:21.12] [00:54:21.12] $9 an hour [00:54:24.10] [00:54:24.22] okay so testing from me so in conclusion testing for new applications differ from [00:54:33.02] [00:54:33.02] anything we've seen before which is code based testing the next generation of [00:54:39.01] [00:54:39.01] software is a challenge and it's a critical challenge and I started [00:54:45.15] [00:54:45.15] thinking about these fundamental concepts in software testing they help [00:54:50.20] [00:54:50.20] guide the appropriate kinds of testing [00:54:56.15] [00:54:56.20] and that's what they think about the direction even touch no I didn't touch [00:55:37.14] [00:55:37.14] it because I don't do formal methods and Mary Jane didn't do formal methods and I [00:55:43.02] [00:55:43.02] could do it but formal methods are good and there has been a lot of progress and [00:55:55.10] [00:55:55.10] formal methods symbolic evaluation and this kind of thing but they're still not [00:56:00.18] [00:56:00.18] powerful enough to test our complex systems so we do not miss verification [00:56:05.18] [00:56:05.18] for a lot of the complex software and I said at the beginning that industry [00:56:11.05] [00:56:11.05] still relies on testing for verification and accuracy but I think we've been [00:56:17.19] [00:56:17.19] talking about verifying code for a long time and we're working on it but I don't [00:56:24.19] [00:56:24.19] think I know it's not an point where we can use it for real critical system [00:56:29.14] [00:56:29.14] systems [00:56:32.00] [00:56:33.02] any other questions or comments [00:56:36.09] [00:56:37.05] visiting some high tech companies and one whose app is on and showing all our [00:56:41.23] [00:56:41.23] phones he said that even if you had the perfect testing and pointed out each and [00:56:47.20] [00:56:47.20] every bug as a true positive not a false positive they would not have the [00:56:52.09] [00:56:52.09] resources to fix all the bugs so so there's also the follow-on challenge I [00:56:58.17] [00:56:58.17] don't know what your thoughts are on how to prioritize fixes and help reduce the [00:57:02.22] [00:57:02.22] cost of pictures okay so um that's Ernie or something and we all know that that [00:57:10.17] [00:57:10.17] you get an error in a piece of software and they said send it to me and you send [00:57:16.20] [00:57:16.20] them the error and they don't fix it because they don't know time so there [00:57:20.22] [00:57:20.22] are lot of errors and all could suffer as that and they don't have time to fix [00:57:24.18] [00:57:24.18] it certainly they have some prioritization and they fix the ones [00:57:31.01] [00:57:31.01] that they think are more but I do think that's another area that we haven't paid [00:57:37.21] [00:57:37.21] much attention to and how do you determine the priorities and I maybe [00:57:42.16] [00:57:42.16] Alex knows but I haven't read very much on how we do that and so I think it's a [00:57:47.23] [00:57:47.23] very good question [00:57:50.13] [00:58:00.06] and I invite all of you to come out we have receptions we have refreshments [00:58:05.02] [00:58:05.02] how thank you [00:58:10.04]