Video: Closing the Agentic Confidence Gap: Quality Governance for AI-Accelerated Teams | Duration: 3744s | Summary: Closing the Agentic Confidence Gap: Quality Governance for AI-Accelerated Teams | Chapters: Welcome and Introduction (6.4s), Webinar Overview (181.025s), UI Coverage Explained (482s), AI Risks and Benefits (603.285s), AI in Cypress Testing (880.56s), UI Coverage Evolution (1124.425s), Coverage Policy Workflow (1508.75s), Cloud MCP Integration (1870.215s), Data Integration Strategy (2241.52s), Human-in-the-Loop Oversight (2387.18s), MCP Efficiency Impact (2556.855s), Key Takeaways Recap (2695.4s), Q&A Session Opens (2843.755s), Q&A Transition (3008.045s), Fueling Agents Content (3018.19s), LLM Task Limitations (3201.02s), SciPrompt Deep Dive (3262.895s), Future Roadmap & Wrap-Up (3428.51s), Closing and Next Steps (3644.64s)
Transcript for "Closing the Agentic Confidence Gap: Quality Governance for AI-Accelerated Teams":
Alright, folks. It is time to kick off this webinar. Hello and welcome. Thank you all for taking the time to join us today for closing the AgenTic confidence gap, quality governance for AI accelerated teams. AI is changing how fast your team shifts. AgenTek workflows are pushing more code through your pipeline than ever before. And as that pace accelerates, it becomes harder to keep an eye on shifts in UI coverage. Tests pass, build ship, and coverage shifts in ways that aren't always immediately visible. Unfortunately, the gaps this creates tend to surface when you can least afford it. Today, we're going to show you how to get ahead of that, how to maintain visibility into what your tests are actually exercising as your codebase quickly evolves so those UI coverage shifts, they don't catch you off guard. Before we get started, just a couple of quick housekeeping notes to make this session run smoothly. The Q and A panel is open and you can drop your questions in there as they come up. We'll do our best to address them throughout the session, so no need to hold them to the end. Speaking of questions, we have reserved some time at the end of the session for a live q and a. We'll be covering both presubmitted questions not addressed during the presentation and any that come up as we go. If you're not able to get your questions answered during the session, please keep an eye out for our follow-up blog post where we'll address all of the major question themes. If you are looking for resources on any of the topics mentioned today, please check out the docs tab. There will there you will find resources that will allow you to dive deeper into the subject matter we'll be talking about. This session is being recorded and you'll receive the recording via email this afternoon in case you wanna revisit any content or if you have to step away. We'll also be notifying the winners of those free UI coverage reports via email tomorrow, so that's another thing to keep your eye out for. And finally, you'll also notice a button at the top of your screen that says start a free trial. This will be available throughout the session. If any of the workflows Mark or Emily walked through would be valuable for your team, use it to book a time with us and get started right away on integrating UI coverage into your workflows. Alright. With that, I am excited to bring on today's presenters, Mark Noonan and Emily Wisniewski. Mark and Emily work directly with teams navigating the challenge of maintaining quality as development velocity increases. They've seen firsthand how coverage visibility breaks down as agentic workflows accelerate, and they've spent a lot of time helping teams build the frameworks to get ahead of that. I'm really looking forward to what they have to share today, so let's just get into it. Please join me in welcoming to the stage, Mark and Emily. Hey, everybody. Thanks so much, Jenna, for the introduction. I'm gonna go ahead and get us started here with some screen sharing too. So this is a webinar that, will cover a lot of ground, and, Emily and I are both product managers here at Cypress. We have some slightly different areas, but they're both adjacent to, the topic of AI and to the the way that people are changing their development and especially setting up quality gates and things like that. So my intention here is we have some slides to support the conversation and maybe a little bit of demo content, but also quite a bit of back and forth as we get into it. So we'll have three main sections. The first is gonna be talking about the challenges and the opportunities related to QA and, AI development cycles, and how this has affected Cypress testing, particularly in the past year as people are doing all kinds of different patterns around it, and still wanting to keep control over aspects of quality and accessibility, which is the areas that I work in most. Then we'll talk about where Cypress fits in. So how do we have the government p the governance piece of, helping you keep your tests under control and the new type of cloud MCP integrations that Emily's been working with, giving you test run information and soon coverage information, plus accessibility details as well from your runs. At the end then, we'll do some of the questions that have come from the beginning and some of the ones that are gonna be asked live. So in order to get started with this, I wanted to just back up for a second without making the assumption that everybody in here is already a Cypress user, already a Cypress Cloud user, and knows all the pieces. And there's really three sections to the system we'll talk about that kind of flows through the UI coverage part of Cypress. So I'll be able to show you, those three sections in order, and everybody at least will have a similar starting point. So first of all, when you write Cypress tests, you're using the Cypress app, to run them locally, and you can do that with or without, AI assistance, either your own or with, like, SIPrompt and tools that Emily's gonna talk about later. And you also use the Cypress app to run your tests. So it's the open source part of Cypress that's just how a lot of you are doing your work as it stands. When you record your test to Cypress Cloud, you're handing over some of that execution responsibility so Cypress can organize your test in an efficient way and be able to, you know, do certain types of parallelization tasks and, scheduling them in the order that makes the most sense based on what previously failed, things like that. If you're using Psyprompt, this is also the process that adds the self healing part of Cypress. So if you wanted to have test steps that are going to be able to repair themselves for, you know, locator changes and things like that, recording to Cypress Cloud is how you would put that into CI. And it produces a set of functional test results, which on the surface is just like, okay. Did everything pass or fail? But it actually drives this whole other set of things in the middle. So this is kind of the core set of the Cypress Cloud experience. Test replay is probably the start of star of the show. Whenever I ask someone, like, what's their favorite piece or what do they use most, the debugging, the time travel of your task execution, looking at the network logs and the console and everything, that stuff from test replay, drives what used to be kind of a video based process or screenshot based process without as much inspection of your page and being able to look at the details of every command exactly the way you would kind of locally in Cypress. And then there's analytics and branch review for comparing different runs and looking at trends over time. There's integrations into GitHub and Slack and those sorts of things. And at the bottom is what's really new, which is the Cloud MCP, and that supports your agentic workflows and feedback loops that go there. So all of this is core cloud stuff. And the products that I'm responsible for as a PM are these application quality products. That's our umbrella term for them. And these build on the test replay data. So we've talked about this in other webinars. We're not gonna rehash all of those, implementations, but it builds on the test replay data which captures all of the steps of every test and all of the DOM states of your application. So you can set up something like, a quality gate around accessibility or be able to debug an issue based on what page it's on, which is not really a part of your tests but was derived from your tests. There's also API access for this data, so you can do downstream automation workflows and things like that as well. And then all of this feeds back into that core set of analytics. So if your Cloud account has UI coverage or Cypress accessibility, you can see the trends in those. You can compare them across runs. They will show up in your GitHub comments, and they'll be a part of the Cypress Cloud MCP as well. So you can start to integrate them, in these other ways. So those are the things I wanted to clarify at the start around where UI coverage fits in when we come back to talk about it a little bit later. What UI coverage actually is before we move on there, is this kind of heat map experiences at the core. So when we do come back later, we'll look at something in more detail, but it is worth stopping now just to set up this as a subject matter. When we have a set of tests in Cypress Cloud that you've recorded, when we talk about coverage and this gets to one of the early questions that came in. We're talking about, does your test use your application in a certain way? So here, there's three different tests. They all interact with different parts of this page, and that produces the idea of a coverage score where based on the events that happened in the test, we can tell you what was your application like and how was it used. And that's pretty relevant if you're going to have potentially a lot of fast moving changes in both your application and your tests. So looking at UI coverage this way gives you a good sense of how the interface is, adapting to new things that are happening. It's worth bringing up because this is very separate to code coverage, which is looking at executed lines of code, and that was one of the things that was popping up in the, the pre advanced questions. So code coverage, being more about how many logic paths exist and are executed as a side effect of your tests, very much internal to the logic of the application, and UI coverage is kind of external on the shell of the application. How is it being driven? So at this point, I'd like to bring in Emily as well to start to chat through some of the AI topics. Emily is a PM for a lot of the agentic workflow type stuff at Cypress. And so while I talk to people who are interested in in coverage and accessibility and kind of controlling those things, Emily also speaks to a lot of people who are more on the the edge of, like, what's the new things we can do and how can we push forward with AI. So, Emily, yeah, why don't you come along and tell us a little bit about what you and the team have been, been working about? Yeah. Hey, Mark. Thanks, for the introduction, and thanks for inviting me to this webinar. I'm super excited to be here today, excited to talk about AI. You know, you and I have talked, you know, previously in other meetings, and I could probably talk about it for this entire time, and not give you any room to speak. So I'll do my best to keep it short. But as Mark said, I am a product manager here at Cypress working on fitting Cypress into agentic workflows in a very seamless manner and closing the gaps that we do have now so that you can have a very hands off, experience and fit into this new kind of ecosystem that's developing. So excited to be here. Yeah. Thanks for the introduction. Okay. So let's move forward a little bit then into the conversation about these kind of agentic workflows. I got a couple topics I wanted to ask you about, and I think maybe, you know, we can sort of start around the risks and benefits area. So I know that we've all seen a lot of high profile outages and reliability issues this year, some of which are connected to, like, moving really fast and having, AI driven workflows and and struggling with maintaining quality in that sense. So I'm curious just kind of what are you seeing around the the, risks and benefits side of things? Yeah. You know, this is a a really great question, Mark. If you wanna move to the next slide. So AI, I think there's no dispute. AI is, like, hugely beneficial. You see it, you know, there's knowledge at your fingertips that you never expected. Teams are moving quicker. We're able to prototype and figure out details that maybe would have taken weeks or months to figure out. You know, velocity is through the roof. I personally am seeing a lot of, like, blurred lines when it comes to roles because AI gives you access to information to do things maybe you weren't as familiar or an expert in. It also like I said, it's it's the velocity thing. Right? So I wouldn't say teams are doing more with less. They're doing more with the same amount or with more information in context all built in and layered in. And so it's just it's really cool to see. I think there's a lot of improvement. I kinda say at this point, your imagination is your limitation with what you could build with it. But, you know, like, as as you mentioned, you know, AI, as great as it is and how powerful it is, there is a lot of risks with it, and we're starting to see it and feel it in the industry where, we know that AI is not absolutely correct. It's gonna have problems. It might bias. It might, you know, latch on to a particular word or sentence that you called out because you thought it was important, and it, like, starts taking you down a path that's just not there. And I think where the biggest risk I see is that it can be very, very confident in the response that it gives you. And if you are not as familiar, you know, this is where the blurred lines and blurred roles come in. If you're not as familiar, it's kinda difficult to navigate that. So, really, what we're seeing is, like, these huge amounts of volume, especially with, like, code being generated, and it's putting a lot of pressure on reviews to ensure that it's right. But that's also putting a lot of pressure on your tests and your quality gates and the confidence you need to have in place to, you know, leverage and lean on AI so that you can move faster and and put more time into those things. So it's kind of a balancing act, really, and it all comes back to how do we have confidence and how do we have trust. And I wouldn't say that this is, necessarily a new problem when you think about development and testing. Right? Like, it's always been a problem, but AI is making it more of a problem and it's becoming a bottleneck. Right? Because, you know, if things if you're able to do things quicker, you're really reducing, you know, the speed of of putting something through the door. So I would say, historically, you know, there's kind of three options that you might see come come up when you're talking about, like, a new feature development. And, you know, to do everything right, you'd have to implement, and then you'd spend your time testing. Right? The other option might be, you know, it's gonna take so long to get it to get to get it buttoned up to the quality standards that maybe it's just not an option to do, and it just never gets picked up, which is a detriment to the product. Or, you know, the alternative is is, like, let's move forward. Let's build it. And maybe you don't keep your quality standards as high as they they theoretically should be. You're not writing the test. You're not doing the full coverage, or maybe you're not having the time to spend on that. And then, you know, bugs are leaking into production, which is also not a good experience. Right? So, it's it's I don't know. It's a really cool time to see, see where this is at. Yeah. I think we're we're we're seeing the same thing from, like, different perspectives of different customers as well. I put this question down here, the one of the things that was submitted, headed to the webinar. Everyone is talking about AI, but no one is explaining in simple terms how we can use AI for quality assurance. And I think that's a good observation from this person, but also, like, a a frustration around maybe vague instructions like, use AI and it will be better, or use AI and people will ship faster. And I know that we have seen a lot of patterns around how this is happening with Cypress testing. So, we can kind of move on now into what is the landscape like around Cypress tests and AI? And, hopefully, throughout the webinar, we can give this person at least some explanations on, you know, how we think it can play into the the testing landscape. Yeah. Yeah. This is a good question. So, you know, things that I'm observing right now internally with Cypress is, like, where are people writing their tests? And there's a few different layers. So, we're saying people write tests using Studio. They are interacting with their application. It's recording those tests. It's creating, you know, AI assertions alongside it if they're leveraging Studio AI. We have people writing traditional tests. You know, they are manually going in, understanding what their application is doing. They're writing the corresponding test code. They're running those. You know, some people are moving a step further. You can click next on here. You they're moving a step further, and you will see that people are writing natural language alongside their traditional test code. So it's giving them a little bit of the best of both worlds. They can have control of what they're testing, but if there's areas where maybe they are unsure of how to test it or, you know, it's a little bit unstable due to the selectors, they're pulling in Syprompt and they're able to have that in line. And then, you know, we see some teams. We see a lot of people trying to lean towards and reach towards, you know, having AI do all of it. They want to generate their tests based on requirements or maybe they know the flows that are important to them, And they say, hey. AI, like, you know, cursor, clawed, whatever. They they're sending it over there. They're using tools like AI skills. Cypress has AI skills that can help with this, right, where you can generate those tests, run them, and have that full experience. So we're seeing quite a bit in terms of, where tests are being written and how they're being written. It like, Cypress always has been a, environment where you can kinda pick and choose what works best for you. But, like, what does this mean for the QA and the testing? You know, it still leads into as code is being generated, whether it's test code or application code, it can be really difficult to understand where the gaps are and where those, where those pain points are. Right? So, it it gets back into governance, and I think this is honestly something that UI coverage really has an opportunity to shine in because it gives you that visual, representation of where your gaps are with you like, with the u within your, your your browser application. Right? Co coverage is really strong at it, and, honestly, AI is really strong at understanding the changes on your files, like, mapping out the changes on your file system. But as everyone knows, developing in the web, like, every browser has nuances. You know, when you start composing your components and all the pieces together, you start getting a response to design or maybe, you know, some request comes in, the data loads, and you see different different things than maybe what was expected. And it's really hard to map that out, through a static code analysis, and this is really where I think, we are still seeing struggles in the industry of, like, how to do it right. Right? Browser testing is not necessarily easy. That's why Cypress, you know, came to came to is, like, we wanna help you do that right. And so what can you add in with UI coverage to help close those gaps is is really something I think is interesting. Yeah. It's kind of what I'll what I'll talk about next. I've definitely been surprised on occasion doing some work or seeing some work that's been generated by AI that just has a a little twist, like an important, meaningful twist, something extra that didn't need to be there, some change that wasn't requested but, like, took place, and kind of keeping up and tracking all of those things run to run-in a in an active way that you try to review every test run or review every build of your application manually would be fairly heavy. And so UI coverage, as we talked about earlier, kinda had that, heat map aspect to it. And I know that some people here will have already seen a lot of what UI coverage does. So it is intended to be, like, passive as you are writing your tests, and then you go in and look at it when you need to. And so supporting the visibility and the monitoring that would benefit you when things are changing really quickly, basically, for any reason, AI or not. Right? If we have a pace of development and pull requests being merged that goes faster than your ability to, like, absorb those changes and see what's happening, then you have the the bottleneck kind of moving from the development phase into the development phase into the verification phase. So with UI coverage, there's a few things that have evolved over the year, and I kinda thought rather than do a full tour of that, which we can do in separate videos and and dedicated calls, Instead, let's just, like, talk about what's new briefly for the last year, and then I'm gonna drill into one specific situation. So first of all, test generation, a little over a year ago, was announced and and, shipped where within UI coverage where we had that heat map interface, you can generate a test for something that's missing coverage, and we'll take a look at that soon. Another thing that was really important this year was this kind of tag based configuration. So we work with a lot of teams where they're looking at coverage scores or accessibility scores, and it's like 40 or 50 teams in one Cypress Cloud project or maybe multiple projects with different things going on, but still add lots of teams per project. And so we set up this config profile setting because for these types of metrics, your opinion about what is in the report is a really important part of making it make sense. So if I'm a developer on a team and I work on a particular flow that cuts across a complicated UI, and maybe we have a monorepo in different areas of the DOM or owned by multiple teams, the signal that I want from something I built is, have I had an impact on the area for which I'm responsible? It's definitely also good to know if you had unintended side effects in somebody else's area. But a lot of time, you're interested in a certain subset of the data, and you're also interested in in a perspective on what the elements mean and how things are supposed to be related. So one person's coverage of 89% on a certain type of workflow or a certain test, that's not really going to be very meaningful across the board. And so UI coverage comes with lots of configuration so you can adjust the signal to noise ratio. You can ignore stuff, and some of the new changes have also allowed you to include additional commands that are not traditional Cypress, like clicks and things like that. You can also activate the possibility to track assertions in coverage and kind of bring elements in and out or group them and name them in different ways. And so I bring all this up because I think the initial thing people think about when it comes to coverage is, well, I would never want a 100% coverage, and I have a lot of elements. And this does come up in the early conversations that we have as people ramp up, that it it becomes a case of you want a perspective on your application that makes sense for your current testing goals and your understanding of what's in and what's out, and that might be quite complicated for you. So having that ability to be tag based and be flexible there is really useful. And then, this new stuff that we'll show, I think this is gonna be the first time it's actually gonna be on our staging server that I show this new policy work that's gonna be available in the next couple of days. And Emily's gonna speak a little bit about Cloud MCP as well and how it relates to UI coverage. So, let's take a situation that UI coverage helps with and dive into that. These were two of the questions that came up about coverage on the way in, and this frames, like, the choice I made around showing you this amount and not just everything in a full demo. So the question was, could you answer the question whether your last sprint increased or decreased coverage of your most critical user flows? And the next one was, when an agent ships a change tonight that removes coverage from your most critical workflow, will you know before your users do? And this this gets to a really common task with, like your coverage is probably complicated. Right? If you if you're looking at something like this, you might have hundreds or thousands of tests already. And so a tool that just says, here is your coverage. Behold all of your elements and pages and thousands of things. There's a use case for that with planning and, like, big picture understanding, but that kind of information is actually not as useful as what's changing from day to day. So we'll look at an example that kind of addresses this where we have two runs, and we have some sense of where we've come out of tolerance. So the classic example of this is branch review. Here is a branch where we wanna merge something from a feature branch into develop. All the runs are passing, and we have a UI coverage score that's gone down. So this is an example where we can say why did the score change and dive right in to look at the details. There's a few potential possibilities for why it changed, and the outcome of this would be looking into the branch comparison between these two. So I'm gonna switch over here and show this example a little bit differently. So here I'm on a run-in Cypress Cloud, and we've got a UI coverage report. It's got all the usual stuff. It's got test results and everything. And there's a UI coverage score here that's at 28%. What's new is that we have a way to tell whether 28% is reasonable or not for this project. So even just being on the overview or something, I can see there's this new little shield that represents my UI coverage policy. And so here, from the cloud itself, we can start to get a sense of, like, what has gone wrong. And my minimum score here was 29%. Like, my rule for this example project was, okay. No matter what, this is supposed to be over 29%. If I click here, we can see a little bit more detail on what exactly was going on with it, and then we can get ourselves over to the UI coverage report to see what's the content on this run. So that's interesting. Right? This run, okay, we know it's out of tolerance, and we know why because it has a 29% where it should have a 28. But we don't have a good reference point for what exactly has changed. So on my runs page, what I've got is the the other runs. And while that little shield is not here yet, it will be. And we can see that on the other runs, there are some different coverage scores. So, these are passing the 28, 29% threshold that we talked about blocking on. And what I can do from here is say, let me just compare to the previous run. So now the run ending in 29471 compared with 29470 shows only one change, really. There is a new untested element, And that, because of the way the numbers have worked out, caused a 1% or 2% drop in the score. This has a custom name. We can see if the check mark for to do items, that's already a really good hint in here about what is going to be responsible for it. And if we wanted to, we can dive into this, which you will in a second. But we can also see one of the changes between these two runs is this increase in pending, which means some tests were explicitly skipped and avoided in the run. So even just from this level, we can understand there's been a change over here. What's the exact element that changed? And there's a likely explanation. It's not a guarantee that these tests are going to turn out to be what it is, but we can take a peek and see, like, this didn't run. This did run and actually dive all the way into, like, oh, what is the test that's no longer there? So that's really interesting. Now let's click into the element itself, especially because without configuration, you might not have a nice easy readable name like this. You might have a CSS selector or something. So it turns out what's failing here is we are missing the interaction that used to be present on this checkbox. So it represents that scenario that the person asked about where we left in a dot skip or we deleted a test, and so our pipeline is passing. But instead of, like, adding a test to the suite, we modified something. Like, the agent might have done all kinds of different things. Here, we can see what's missing. In this case, we would likely think of the explanation being like, okay. That tracks with the the tests that were skipped and what those were called. But we can also make a new test for this if we want to. So I can choose a spec that renders this particular type of checkbox. And in Cypress Cloud, from the thing that is yelling at you that you should cover this because it's outside of your policy, we can also generate a spec or a test a test that can go into a spec. And you can see it didn't do, like, a visit or anything. Right? It just kind of fast forwarded because there's a before each that does a visit already in the spec, so we know we don't have to repeat it. And then we can grab this, pull this down locally, maybe hand it over to the LLM and say, hey. Here is a test for something you didn't test yet. So explain yourself. Figure out what user flow this belongs to, you know, whatever you need to do with this. But it is basically a fast forward to put your Cypress app in this moment in time where this interaction would happen. I think it handled this okay. Like, it picked one of the three because they were a group of repeated things, and it said grab the first one and click it. I think that's probably, pretty decent starting point for a working test. So that's the flow we can do all the way in here. And in case it wasn't clear as well, this is not, you know, an image or something. This is the DOM you can inspect and debug and print things. This is the snapshot from a test replay where this was rendered. So it all kinda drills all the way in right back to the interaction if it existed or if it didn't exist, the rest of it. So that workflow is the main thing I wanted to show here just as a single slice of how you might use UI coverage to address and investigate unexpected changes. It doesn't mean every single change in coverage is going to trigger your policy or every single detail is is going to be that way. And you can also set more complex policies. I think if I go back a couple of runs, we'll find one that has, like, little bit more detail. There's an accessibility one as well, and sometimes there's a a view specific policy. So it's just a single page that has a requirement of the score. There's a few things that are gonna be documented soon for this that will be more flexible. And, if you're watching this now and you're thinking, you wanna try that out, it is a good time to hit the button to set up something for getting a trial with us. I'm very interested in feedback about this, workflows you wanna build on it. It would flow into all of the other areas like, comments in GitHub and then, you know, the MCP and the results API and everything eventually to allow your programmatic reaction to being out of tolerance. So that's the main thing I wanted to show on how UI coverage works. And then, Emily, we can flip over and talk a little bit about, kind of Cloud MCP where, I've just shown really the manual review and debugging workflow, and you can tell us a bit about what people are doing with, like, agents in the MCP. Yeah. Mark, thanks so much. So yeah, so CloudMCP, what is it? Right? So I'm gonna start with that. And, like, what is it? Right? CloudMCP, what I'm gonna step back actually and be like, what exactly is, MCP. Right? MCP is a mechanism for your agents to connect to an external source to have pull data in and have additional context so that it knows what it should do with it. Right? So, I think the analogy that I saw on a YouTube at one point that I think is just it resonates really well, especially for our more nontechnical users, is that it's kind of like a USB plug into your agent to connect it directly to the cloud. So we recently launched the Cloud MCP, and this gives your agent access to the information that you're reporting to the run. So there's information around, like, did your test pass and fail, what's the the flaky information, what's some of the other things, that you can do there. Mark just mentioned we did the accessibility webinar, and we did recently just release accessibility MCP tools. And what's coming in the next week or so is going to be UI coverage MCP tools. So everything Mark's just demoed, that was around the policies and how UI coverage can help put those quality gates in check. The Cloud MCP is a tool to give you the opportunity to, you know, take it a step further. So you, have a run and you have your UI coverage report. Right? So, now what? What do you do with it? You can go. You can explore it. You can do everything that Mark just demoed for you, but we're in a world where we wanna move faster and we wanna be a bit more hands off. So if you enable the MCP, what you can do is you can actually pull that report data down, and you can interact with it yourselves, with the agent or leave it be hands off and and do something with it. So, Mark, if you wanna hit play on this demo I prerecorded because live demos never work well, and I didn't want you guys to be at the mercy of seeing me try to flip through all this. But if you hit play, here I'm showing you a prerecorded run, and I am now moving over to my agent. And in this, I'm saying, hey. Go pull my latest UI coverage report for this run. And what this is gonna do, your agent's gonna see that this CloudMCP tool set is available, and it's going to go find the latest run for this project. And then it's going to then go and reach for those UI coverage results. And so we offer the ability to pull the scores, the views, and the elements, and untested elements, and untested links. All of that is gonna be available at your fingertips. So where this really shines in the context of an agentic concept is that here you can see it pulled the same 10%, the same run that we just they just showed. It's a little bit fast, but, if you go on the rewatch, you can see it if you're if you're super interested. Right? So here I'm gonna ask, hey. Like, I have an untested element. Like, let's go add coverage for it. This is a very convoluted example because I curated it, so I kinda had a very predictable result. But what you're gonna see when you have, you know, agents add in new, taking your requirements, making those feature updates, it's gonna say, hey. Like, I see these things happen. I saw this test ran, and maybe there's gaps. Right? And this is where the branch review and those policies come in, and you can be like, okay. I see there's a problem. Let's pull it back. Let's add these test cases. And it will then go and it will generate, and you will be able to run it. So, in this example that I created, it just simply visited my to do app. It's the it's in the Cypress example, project, so it's it's very simple. Everyone knows what a to do app is as, and, you know, you add, you remove it, and it's very simple. So in this context, it visited and it's all like, hey. I never actually added an item, so let's go and find and generate a test that will add a new item. So this is adding this. This is all being running by cursor. Just using, I think it's SONNET. This is what I use for my model here. And then I went over and I just rerecorded to to have a full experience. So in this view, I now am moving into the second run. So I went from run nine to run 10, and it's running. Here, you can see that, you know, another test was added. Before I had one, now I have two tests. And when I move back over to the UI coverage, it's processing. It's loading. And now I can see if you it's now at 20%. But, you know, I'm not super confident because I have a lot going on. I'm switching between things. I'm looking. So I'm gonna say, hey. Like, did my coverage actually increase? Because I would say, typically, from my perspective, when I do a lot of development, is a lot of times when c when it kicks off to CI and it's recording to the cloud, I'm not sitting there watching it to, like, look at results. I'm coming back to it when I know it's complete. So, hey. Did did we actually increase coverage? It said yes by 10 because I told it specifically what to test, and we can see that. You know, I'm I'm coming here going to the branch review, and I'm saying, hey. Like, what was actually solved? And you can see we added the test coverage here. So this is a little bit more hands on, I would say, than what we are wanting to set users up for, purposely for purposely for demo purposes. Right, because it's kind of a lot going on and there's, you know, a lot of moving parts. But it's it's kind of the easiest way to try to explain and see how it connects up to what already exists in your fingertips and how you can pull it back down. So, you know, when we talk about UI coverage and we talk with customers, some of the things people wanna do is say, alright. How do I connect data from other systems and bring it into the cloud so that I can actually, like, make these tests correlate to meaningful workflows we see our users doing? So MCP paired with other tools, an example that was given to us was I have Google Analytics data. And my Google Analytics data helps me understand the pages that are most critical based on web traffic by my real users. So you can pull those two datasets together with your agent, and you can trace and ensure that you have the adequate coverage on those places, using the policies that Mark suggested, or just just talked about, and then pair that with your agent and say, look. Like, these policies are in place, and we need to make sure that we have adequate coverage here. Let's go find all the missing gaps. Let's get those gaps covered, and then let's review it. So, when you start looking at reports around testing, you know, I think in the last few years, I typically see trends of developers spending about 40 to 50% of their time on the dev on the testing side. And maybe not just developers, but, you know, QA teams, like, they're spending so much time on the test portion. And even just trying to identify those test cases and understand what do I even need to do to have confidence in what we're building and, you know, layer this in with other data points around real metrics, layer this in with what UI coverage is showing, and, you know, handing it off to, models, like these element models that have the context of what your application is. You can add in the requirements and some of those product details that, you know, is really a lot for a person to mentally load in and balance and try to, like, reason with well and, like, quickly. Right? So, of course, we can do it. We have been doing it, but, it takes time and it takes real thought. And so when we can pass some of this this cognitive load off to an agent where we are then now reviewing and understanding and mapping and having, you know, the cross correlation of, like, is this doing what we're expecting and is this mapping out like we want? It it's really cool to see. So, yeah, I guess, is there anything else, Mark, that you wanted me to talk about in terms of this or any other any other thoughts or comments. that you numbers? It got me thinking about a couple of things. So it wasn't long ago that we released the accessibility MCP, and, people jumped on that really quickly and began to use it. But it it it was used, one customer that we talked to, to kind of produce a a nightly informative review based on their accessibility issues and not necessarily to fully hand off the feedback cycle. On accessibility, the idea of fixing it with a hands off, LLM approach is is not that strong for all kinds of issues anyway, so you need to be able to, like, step in and review it. But it it's the sort of maybe human in the loop is a good word for the the idea of where there's traceability and ownership, back to, like, a decision that a person made about what was acceptable. I think where you have at least that on your your in your process, you can kind of see the tests as really defining what is correct behavior. I mean, that the whole idea of the spec file is is that we have, a reference point for what's correct. And if the code changes but the behavior is the same, then we're we're doing a lot to prevent at least known wrong behavior, known types of bugs from happening. And, when it comes to how the MCP is used and how all the pieces fit together, the app quality products themselves kind of try to give that visibility where they'll put their hand up and say, hey. Okay. We're we're out of tolerance now, and this is when you need to have your attention because, like, your input and decisions about what's acceptable are required. And I think one of the scariest things for people is the silent changing of conditions around your application or around your, definition of correct behavior where nobody decided it. Right? We were so hands off and so out of the loop on, exactly what the details were. That that is, I think, the the risky point that people want to avoid, and it's just a case of balancing. Can I read every line of everything and still go really fast? Can I manually test everything and still go as fast as the, the pressure seems to be on speed of delivery? And the the answer there is probably no. But you still are responsible for the things, the actions that your LLM takes, the things that it does. So having that through line and that observability is really, really useful. Yeah. You're right, Mark. It it is really useful. And I I really narrowed in on the cases of covering, like, coverage gaps and, like, how you can map those out with UI coverage. But, you know, the Cloud MCP really unlocks, more. Right? Like, it lets you have the flexibility to use your data in the ways that you need. You talked about nightly reporting, you know, even just triaging, understanding context, issue correlation. Like, it opens up so many opportunities to really map out, like, this is doing what we needed to do and having that confidence there. You know, maybe this particular portion is a is a problem. And even just, like, not even not even with, like, Cypress MCP. Right? Like, if you think about it, like, I use Claude all the time to be like, hey. Like, engineers that I'm working with, like, I have an idea. Like, let's let's let's capture it. Right? And just even being able to use Cursor or Claude and be like, hey. Like, go create me this GitHub issue, assign it to this person and everything, and just being super hands off and kind of in one particular tool to kinda do most of my work. It's it's super powerful. It it really helps with the context switching, and lets me really focus in on problems that, I maybe didn't have as many opportunities or time to do just because I was trying to be like, oh, I need to do x y z, and you're so focused on wiring up those details that they are very important. Right? That's how you track, and that's how you can do the process, and that's how you can do progress. But, like, that's not, like, the exciting part of the work. Right? And so, like, it's it's things that are there to help us keep us in our bounds, but, like, how can we leverage and lean on it so that we really can, you know, focus on what we want. So I'm very excited to see where tools like these, like the MCP and other things, lead us in the industry throughout this year. I I see adoption is just skyrocketing. I know, you know, I I said previously that we said that testing can take up to 40 to 50% of, you know, kind of the iteration cycles, for a new feature, a new thing to go out. And I wanna see what those numbers look like at the end of this year. I think it's gonna be I I hopefully, it should be a lot shorter. Right? Like, that that would be the goal here. So, I'm interested to Yeah. Absolutely. So. I know we're gonna get to questions in a minute. And before we dive into that, I just wanted to summarize, briefly a couple of things that I think are the the takeaways from this this event. Big picture, just understanding what's happening in your task quickly and easily is more important, than ever because there's just a higher volume of changes happening and a larger scale probably of changes. So more urgency for the pipeline to be as good as it can be as a quality gate. And we talked about how, like, the cloud process itself and, UI coverage and the MCP integrations give you a lot of visibility and governance governance capabilities around, agentic workflows. We focus mostly on UI coverage, but the the gate for accessibility applies pretty much the same. Your your risk area of LLMs doing things that are surprising and novel might be about the same as, like, your best back end developer deciding to do some quick front end tasks and making a bunch of weird front end with divs and spans. LLMs might actually have a better success rate on some of the basics of accessibility, but you want that built in consistent feedback and that mix of, like, the deterministic fact of what was detected during the run with the kind of probabilistic process of generating tests and generating code so that there's a a part of it that's just rooted in what actually happened in the execution of your tests. So worth pointing out for UI coverage and accessibility, there's not an AI piece of, like, determining what element did what thing or was used in what way. It's just fully deterministic output, so it's kinda data driven, giving the reality back to you or back to the LLM. And then last takeaway is, if you press the button to get a trial, we can have a meeting with you. As I mentioned earlier, customization is really key for UI coverage. There is no generic solution that would apply to everybody's particular project and their goals. So we do this kind of, as a process that involves a dedicated demo with you, understanding what you need, and then setting you up with a starter configuration so you actually get to see, like, what it would look like more in a practical situation for you. So, Jenna, then, it's a good time to kinda hand back to you and see where we are with questions, and you can drive us forward. Awesome. Thank you, Mark and Emily. The situations that you described are ones I think a lot of folks in this session are navigating right now. We are gonna move into opening this up to questions. So it is not too late to drop yours into that q and a panel. We'll work on getting through the as many of these as possible, but remember, keep an eye out for that follow-up blog post, if we don't happen to get to your question in session. While folks are typing their questions, I'd like us to get started with some of the presubmitted questions. So, let's go ahead and start off with, this one. I will share it to the screen for you all. How do you keep up reliability and test maintenance in the world of agentic development? Not sure which one of us should start with this one. It might be a good Emily question. And I think we we covered a lot of this throughout. So, Emily, maybe is there anything to add on, the reliability and maintenance side of testing? You know, one thing that I think is really in particular is, you know, if you're finding a lot of reliability issues and test maintenance issues, I would really take a look at why that might be the case and what the root cause is there. You know, always happy to jump on a call and do more of a deep dive on your particular use case that for whoever submitted this. But, you know, I think a lot of times when you start getting into reliability and maintenance, there's either two problems. You're either using, like, flaky type of selectors, and I think there you can lean on your agents to you know, if it writes a test and it's using a selector that's not really within the standards of your team, ask your agent to add a stable selector, and you can just reference that in your test. And and, you know, that just helps, like, right out of the box. You know, if it's more of a cause of where your tests are separated from your, code as it's changing and trying to keep up with the visibility there and trying to ensure that those workflows continue to have correct and adequate coverage. It's a little bit of a different problem, but there's ways that you would be able to I would recommend is, like, figure out how you can build out your workflows where when things merge and things are deployed to your staging environment or whatever you're hitting, get those notifications immediately and try to build it into an agent, like a hands off agentic workflow where it just it just updates, it pulls, and it it does the comparison. Hook into the cloud MCP, record to, the cloud, see what's changing, and, like, have that iterative approach to kind of build that cycles out. So, yeah, happy happy to I don't know. I could talk about this a lot, but, hopefully, this is enough to add to what we've already discussed. Excellent. Mark, was there anything that you wanted to add on that? No. I think we're good there. Yeah. Thanks. Awesome. Alright. Let's go on to our next question. So, I'd like to switch us over to, a little bit more information about, going in and fueling your agents with the right content. So, I know we've talked a little bit about this, throughout the session. Maybe we could, you know, get any additional tips or tricks, Emily and Mark, that you have for this. So I'm gonna go ahead and share this. How do we prevent fueling the agents with the wrong content? Why don't I, pass pass it over to Emily first, and then, Mark, we can go over to you. This is a fun one. Fueling your agents with the wrong content. You know, context is so important, and it's so incredibly important. There's definitely times where I have almost on purpose, like, giving it very little context to see what it will do and produce. Right? Like, push the balance of, like, what what's that level. And then there's times where I try to be very prescriptive, and I find it just kind of clings to something. It goes completely off the rails. So it it's definitely gonna take a little bit of balance and exploration. When I'm doing development, I really find having it analyze the product behaviors and understanding, like, what is the intention and the outcomes here before it decides to write test really helps it hone in on really what are you trying to capture and what are you trying to validate. Right? So that's probably my first word of advice on where to start looking at. We did release the Cypress skills, a couple weeks ago that help with looking at how to create the stable test and analyzing patterns that are already existing. Right? Because you're not gonna want just some generic content that an agent's producing. Like, you want to follow your team standards, and maybe your team standards aren't really the standards you want and you want to change that, and you can guide it to do those things. So, yeah, it's a bit it's a bit open ended. So I feel like this maybe wasn't, like, the most direct answer, but that's where I would start looking at. Yeah. Thanks, Emily. Mark, do you have anything to add on this? Yeah. Just a couple of things. So on, on the the side of the app quality aspect, especially signal to noise ratio for, like, what's in your reports, like, specifically if we're looking at it's gonna be processed with an MCP. How can we customize what what is even the surface area we report on to maybe make it clear when something is owned by a third party or when something belongs to the design system. Those little clues that might be in the DOM or might be possible to configure can help you on just the the narrow case of, like, UI coverage or accessibility reporting flowing into an LLM, for for kind of processing. So scope of what you're including and, like, not asking the LLM to filter for you, I think, is a really important part of our expectations. A lot of times when I've seen, like, disappointing experiences, it comes down to slightly misunderstanding what task NLM is gonna be good at and and whether it can adequately take what you're take describing and, generate the right query for the data and then pull the data and then act correctly on the data. There's supposed to be maybe multiple steps of responding to your questions. So, for example, I know with, like, Cypress prompt, we are very cautious about what exactly it is that is handed over for the an LLM to do. And then, those are tasks that are really suited for an LLM to be strong at. So we didn't talk about prompt much in this webinar, but the the natural language testing part of Cypress, very carefully managed to be like, here's exactly what an LLM is strong at doing. We can hand over that, but the before and after is deterministic. And I think that applies to everything that, like, being cognizant of where you think there's a strength, where you think there's a weakness for sure. Thanks, Mark. And I don't think you could have set us up better, for our next question. It's kind of like I paid you to do it even though I didn't. Let's go ahead and share this next question. And this is really talking about SciPrompt and kind of bringing that into the conversation. We had a slide earlier where we talked about, you know, some of the traditional test writing versus using, SciPrompt versus using AI generation. So, I think this is a good opportunity for us to maybe dive deeper into those differences, and, you know, those approaches to test authorship. So, Mark, why don't we just pick up the conversation again and let you take it away? Yeah. This is another one that could go for a large amount of depth and time. For me, I've seen a lot of people who use Psyprompt in combination with UI coverage and have been ramping up with it. There is an author piece where you can write a test with natural language and then eject the code if you're only interested in, like, I wanna turn my requirements into Cypress code and have Cypress do the work. So that does lean into the strengths of, like Cypress is good at creating a working Cypress test. It's very good at making a working Cypress test from your natural language steps that might be your acceptance criteria in a ticket. And so having Cypress do that, as opposed to having an LLM guess at what the test should be, is is really good. Sometimes you will have a test written by an an agent that doesn't ever run the test or know that exactly how it works and everything. So that is a valid use case for prompts to just use it to create it. The other part that's distinct, though, is prompt is able to, verify some aspects of the elements that it is locating as a result of, like, having written Cypress code for a specific prompt. So if an element is not matching the prompt, we can be able to self heal. Like, we can't find the element anymore, but there's a similar element with the same function. It's like, okay. That moved in the design. That self healing aspect is good for maintenance, and you can do all kinds of fun patterns, like use prompt only as a getter or use prompt, in other various ways that, for some reasons, I won't get into, but almost as a whole separate, like, bug bash or something, Jenna. Mark, you forgot to mention you know, you talk about what's the difference between traditional testing with prompt and then AI generated. There's nothing limiting you from saying use AI to generate tests with sciprompt tests, like sciprompt commands. You can do both. You can have the best of both worlds. So, it's really just what you are kind of guiding your agent to produce and what you would like to use in your recording test. So I think we have time for one more question and, Emily, I'm gonna pass this one over to you to kinda wrap us up here. Thinking about the MCP, are there any plans to expand the Cypress Cloud MCP to add more functionality? I know we mentioned in this webinar the UI coverage tools, anything else that's on, top of mind or on your immediate radar there? Yeah. So the, most simple answer is yes. We definitely want to add more to the Cloud MCP. I didn't mention it earlier. The Cloud MCP is going to be available to anyone on any plan that's recording to the cloud. So, you don't need to be paying for UI coverage or accessibility to use this and gain the value. You would just need to have UI coverage or accessibility to have the UI coverage or accessibility tools. Right? But in terms of Cloud MCP, right now, it's returning, you know, the high level metrics around stack traces, your errors, your flake rates, right, like, information around those types of things. But where it kinda comes back to the confidence and trust thing that I mentioned earlier. You can go quite far with copy pasting in, like, error traces and those types of content, and, like, this reduces that kind of need to copy paste, find find all your errors, and and kinda do that triage naturally that you might do when inspecting test replay. But there are definitely cases where, you know here's a really simple example. Your test fails because an element did not render on the page. Okay. That's a very obvious error. It's not visible. It's not there. Right? Sure. Like, I understand that. I don't need an agent to tell me what that means. But the real answer is why, and it gets more into the root cause analysis. And if you go into the cloud and you look at your test replay, you can see potentially very clearly that it didn't render because the content that came back from the last request was actually, you know, maybe maybe it failed or maybe it was, like, the wrong data that came back. So how can we expose that information so that your element has a more context when it needs to? Not always. Right? Because you don't wanna just bloat up like, bloat your context and and waste a bunch of tokens when it doesn't need that information. But, like, how can it reach for that? So we are offering, something. It's an it's it's a bit experimental right now. It's feature flagged, for people that would want to test it out, and we're kinda still iterating on how we would, do it well, I think, is the best way to put it. Not that it's not possible, but how can you interact with those test replays in an isolated and safe way so that you can pull some of that context out when needed and go reach for it? And, naturally, this leads into, the whole idea of, like, the agentic flow is how to be very actionable from, a very hands off approach. So we've talked about you can report to the cloud. You have your data. You can bring it back. You can act on the data that was recorded to the cloud that's problematic. Right? Or you can act on the UI coverage reports that were, the elements that were failing or the missing links, and you can add that coverage. But now what? Right? Like, you generate your test or or maybe you wrote some tests. Like, you need to close the loop. Right? And what does that mean? And and I saw this question pop up, in the chat as well. It's, are we gonna offer something like the Playwright NCP? And we are looking at the best option to close that loop so you can do that local, interaction and feedback cycle as well. Right? Because that's really gonna be the piece that we need to ensure that you have a very successful agent tech experience and how can we get us there and how can we support that. So, yes, more things are coming. There's a feedback command, so feel free to, dump dump your ideas into there if you need to. But, yeah, I'll wrap that up. So Thank you, Emily, and thank you, Mark. We appreciate both of you taking the time today to, walk us through how teams are approaching coverage visibility in the age of agentic development. You shared some really practical stuff, I think, for us to walk away with. Before we let everyone go, I have just a few quick things. We do have a survey that is going to be appearing on your screen right now. So if you will go ahead and take that survey, that'll give us a lot of great information to make these even more valuable for you in future sessions. We'll also be sending a follow-up email after the session with the recording. And, if you are one of those folks that were selected to receive those, 10 free UI coverage reports, you'll be hearing from us via email tomorrow. Trying to catch Cypress out in the wild next. If you're in the Atlanta area, you can catch us next week at the Agintiq Quality Summit on May 1. It's a full day conference for engineering leaders transforming their quality organizations with AI, and there's a link to register in the docs tab. Finally, if you are ready to see UI coverage in action on your own test suite, go ahead and click that button at the top of the screen to book time directly with our team and we will get you started. Thanks again for spending time with us today, until next time folks, happy testing.