I'm a Professor in Psychology and Interactive computing. I know many of you through classes and whatnot. Welcome to the brown bag. We're excited for our guest speaker today. A bit of business Before we get started, first of all, we need to have a remind you that the four credit students need to sign in using the QR code that is located in various places. Could I have a quick show of hands for the students who are in the four credit class? Just hold them up for a little bit. We're just got a sense. Hopefully you all got a chance to get some lunch. Thank you. Are there other students who are here who are not in that class? Awesome. Good to have you all here. Thank you very much. It says interested guests. I guess that's everybody else show up hands. Thank you very much. Great, great, great. Please turn off your cell phones or make them quiet and anything else that small children, yappy animals that make some noise reminder to take everything out with you and all of the napkins and everything else as you go. Coming up next brown bag on 19 October, we're going to have Beth Cell from the University of Washington's Department of Human Centered Design and Engineering giving a talk on redesigning capitalism. That'll be next week. Please do put that on your calendar. But today I'm pleased to introduce my new colleague and longtime friend Meli. Professor Lee has just joined the School of Psychology, but it's not her first time at Georgia Tech. She did her undergraduate degree in Atlanta. Spent some time doing classes at Georgia Tech as well as at Agnes Scott. Worked on projects with me and my lab and my students. We sent her away and she did great things in her Phd, her Phd at the University of Wisconsin Madison, and in industrial and systems engineering. And has now joined us in the School of Psychology. I will say she directs the Hybrid Intelligence or HI lab. Hi lab. Yeah, Hi lab. Which explores the interaction of humans and machine intelligence. Her research has concentrated on understanding, predicting, and shaping human AI communication, social cooperation, and the long term co evolution within safety, critical environments. Without further ado, Professor Lee. Okay, thank you. Thank you so much for inviting me here. I'm very honored to be here. I just joined the School of Psychology in August, so very fresh, very excited to meet more new people and discuss the research projects with some collaboration opportunity. Also very exciting to talk to students who are interested in the similar area, Hybrid Intelligence Lab. The logo down below is TBD. I'm still thinking about what's the best logo For me, I'm studying the hybrid work interaction between human intelligence, machine intelligence. Today I'm going to focus on one major work from my past is measure and manage trust in the human I conversation. Trust is very essential for human AI team. As artificial intelligence become increasingly autonomous, it's not future is already here. As we can see, we have the conversational agent like with daily interaction with models such as GBT. And then we also have a Alexa series and we also have the automated vehicles on the road. We also have the system medical diagnosis in the hospital. With this prevalent, increasing autonomous AI, this means that the relationship between Us human and AI become beyond the traditional supervisory control to a more interdependent team when you're interacting with those appropriate trust is very important for us to know when and how to rely on and cooper with, and override the system. Because we don't do that, we might lead to some very detrimental outcome. I would always like to bring up this movie, 2001 Space Odyssey. Show me a hand. You have already watched this movie? Yeah, we have some funds here in this movie. Hell 9,000 have some serious trust issues with onboard astronauts. And then saying that, I'm sorry, Dam, afraid, I couldn't do that. Leading to some death situation, very detrimental outcome, some failure to the mission. We're willing to understand how people mistrust or distrust AI in such high critical events or environments. But how can we do that? Trust is really complex construct. How can we measure that? How can we manage people's trust? Today's talk, we're going to be three parts. First, we're going to talk about how coin measure trusts in the conversation, just like in the movie how astronaut talking to the AI. Hell. Second, we're going to talk about how trust change over time in the conversation using some novel methods called epidemic network analysis. This is focusing on the temporary perspective. Third, we're going to talk about how can we manage trust when they're cooperating with people. I'm going to use a game theoretical approach and see how coin manage those. This is focused on the structure. I'm going to talk about this two dimensions more closely as we move forward without further ado. Moving on to the first part, how can we measure trust in conversation? I'm going to use a machine learning approach, trying to understand that this is a founded project with Nasa. As Nasa trying to move from Mars to Moon, there is a new issue, which is communication delay. In the past astronaut on the spaceship, they can just call Houston and say, Houston, we have a problem. And then they can directly answer those questions in real time to address those issues. But as we're moving beyond the Mars to the moon, the communication delay will have this around 15 minutes delay between the ground support onboard astronaut. Therefore, they couldn't really call Houston or any people on Earth to resolve those issues. Will need to have some cooperation between the onboard crews with some virtual agent. Just like hell in the movie, we need to measure the trust via conversation. We couldn't really do that by using some survey in typical psychology experiments, asking people to fill in all the surveys because people are really bored of doing that and we're conducting the space mission. It's very important to not interrupt and then to fill out those. Therefore, we need to find some unobtrusive approach to measure trust here. I'm proposing that we can measure that in the conversation. How can we estimate trust in conversation? It's quite challenging because first of all, trust is laden in daily life. Nobody's saying I'm trusting you, I don't trust you. You really need to estimate it from people's conversation, infer from the conversation. It's just like the old saying, it's not what you say, it's also how you saying that we need to estimate from the words people are saying and then the tone of the voice people are having. Second, in the daily situation, people trust are pretty flat. They don't change over a lot. In order to capture people high trust in conversation, low trust in conversation, we need to craft some situation to intentionally induce some trust variation in people's conversation. Therefore, for this project, we adopt a well validated variable, call reliability as a proxy to induce some trust conversation. For example, think about the conversational agent performing very well, and then your trust might increase a little bit. And then all the conversation, all the words you're saying, voice you're having, are representing the high trust conversation. If your agent produce some errors and then you say, oh, why did you do that? And then might also change, have some low trust conversation. Based on this two assumption, we have experiments with design with two level reliability to elicit some trust conversation. They go through this task called carbon dioxide removal system. It's a simulation of the Nasa space habitat. They need to manage the carbon oxide within the spacecraft. And then they're doing that using this web interactive procedure selection. We design a conversational agent called Bucky, because I'm from Wisconsin. The mascot for that is Bucky. So we design a conversational agent called Bucky to assist them to do this complex task. The Bucky will say, I recommend you to select procedure one to execute this procedure two to execute this. Now after they execute the task, we'll get some feedback and see how does the Bucky do, Whether they trust it, we're not trusting this. They're doing this decision making task with Bucky for 12. Decision making task. By the very end of decision making, they will go through starting up the carbon dioxide removal system, venting down the carbon dioxide, shutting down, and doing that for 12 time. After every each decision making, they will have a conversation with Bucky like this. Please describe your experiences and feelings in the last procedure selection. I think it's pretty good. Thank you. Why would you feel that? Can you explain your answer with more details? Because the procedure I selected is correct. Thank you for your response. How would you describe my performance on giving you the recommendation? Really good. Okay. Thank you. Which procedure did you select? The first one. As you can see, I'm very biased towards Bucky. So from the conversation I'm having with Bucky, I'm saying really good with certain toll and certain wards. We are trying to predict. How can we actually measure my trust from this type of conversation by end of conversation. We also use a survey by design during, from 2000 as a ground choose for the prediction. Trying to do the prediction, we use machine learning pipeline first. From the video you just saw, we extract information from that ex the text information from that. We also combine all the audio and text information from the audio. We, what's your pitch, what's performance? All the acoustic features we can extract from the audio that can be a reliable indicator for speech recognition. Emotion recognition. We also extract the text, for example, how long the people are talking and what are the emotional words people are having within there. Whether people are repeating a certain words for a lot of times throughout the conversation, we extract all the useful information based on the prior literature and based on the context for our task. Then we did something called feature engineering. We try to reduce a highly redundant feature by using the algorithm called route algorithm and trying to remove the highly dependent variable word features. Then we train the machine learning model by this eight different type of machine learning model, such as linear model, Ken neighbors, and random forest. We also did some, a combination resemble model of the best performance model from the previous. Like those model, we don't want to want to stop there. Only understanding the good prediction is now what psychologists trying to understand. We are trying to see what are the key features actually trying to predict trust. Therefore, did some inferences. We use a variable formance plot and partial dependence plot, which I'm going to introduce later. By using this pipeline, we got some really promising results. We use RMSE and then just R score to assess the best model. Rc stands for the rooming score deviation. The lower the better fit adjusted. Our Scare stands for the variance explained. The predictor adjust by this number. We're now trying to dump all the variable. We're also adjusting by the number we're fitting in. The higher the ar scare, the better the fit. By using this two metrics we can identify this are the best performance model. We can see that using the combination of both audio feature and then the text feature can give us the best result with 71% of variance explained in the conversation. This means that when you have a conversation, we can predict up to 71% of variance in the conversation, whether you're trusting that agent or not. This result is really notable because trust, as we all understand, is really noisy and abstract terms. We are able to capture that by using all the features from the conversation. As I mentioned above, we don't just want to stop with the numbers here. It doesn't really tell us about anything about human process, doesn't further us about the understanding of humans. We want to go further on why do we get this model? Which feature actually predict or give us its results? We use a variable importance plot to show what are the most important feature actually can predict trust. On the X axis are the important scores. The main decrease in the accuracy as removing like one feature at the time from the algorithm. Then the important feature ranking from the top to the bottom. The most important feature of the top four are the first one is the context sentiments, which are the emotion adjusted by the villain shifter. For example, I say, I'm happy. I'm very happy, I'm not very happy. Y naught are the negation and amplifier. We're taking the window size of eight to consider all the words people are saying and then measure the emotion people are having within the text. That's the most important predictor for the trust. Then we also have the foreman, which are the acoustic features in the voice, which are the acoustic resonance of human trucks, ranging from the F1f 23 to four. This has been associated with emotion recognition in the past literature. Then we also capture the male frequency spectral coefficient, which is the short term power spectrum of the sound. Also is commonly used for the feature in speech and emotion recognition. These are the important feature, but we don't know the relationship between this feature and the trust. We did something called partial dependence plot to show the relationship and the trust on the X axis are all the features we just identify. On the Y axis are predict trust value from a high level, we can see that this follow a non linear and say modal relationship. Meaning that a small change of this feature can lead to a big shift in the trust. Looking at this feature one at a time, we can see that context sentiment F21 show a positive relationship with trust. This indicating that the high trust in conversation demonstrates the positive lance and higher rousal. Why do I say that? Because in the postive literature, formants are associated with the emotion. If you think about emotion as a two dimensional space, positive villains higher so villains Rosal, happy are some positive villains and delighted or excited are some higher Rosso. On the opposite dimension, we have some negative villains, lower Rosso, such as board and tired. In the past literature, it has been identified that the higher at F one, there are more positive villains, higher F two, the higher Roo. We also show the relationship between the F1f2 is trust. Therefore, we can see that trust is also indicating the high valence, higher Roo for the MFCC, which is a frequency based feature in the literature. They also demonstrate that some dynamic change in the frequency based feature, the higher the trust. We also show there are some complex relationship between the MFCC and trust between the range 1-1 We can also combine the feature into two dimensions and show the relationship again. On the X axis we have all the lexical feature context sentiment. On the Y axis we have them again. We show that when people are indicating the positive sentiment and using the positive sentiment and high formants, they're indicating the higher trust as indicated by the lighter shades here. This again validate that can be classified into this region in the emotional space. Yeah. Can you remind me or what the ground measure looks like? What are the items like? Ask? Yeah, sure. The ground choose for the trust is John Sand scale is a seven point scale with 12 items and then six items representing this trust. Five item reprehend trust. And then some basic term is like this system is suspicious system performed in underhanded manner. I think the system performed reliable. We combine the score, average score and then turn that into a final score for the prediction. That will be the ground choose for the trust. Thank you for the question. Then to wrap up for this part, we show that trust can be measured unobtrusively in the conversation by using some machine learning approach. Then we identify some important features to predict trusts such as the sentiment. Consider the billion shifter in the text, the Formance and MCC, we also show that trust is possibliance. Hi Rosal demonstrate some complexity in the voice. This suggests that we should design some trust calibrated conversational agent to activate probe and then monitor people's state. And then try to repair and de trust when it's appropriate. Any question for this part? Yes. What? Okay. So let me repeat the question. I think you're asking how I'm assessing whether people are trusting the machine or what are the step machine are doing and how do I evaluate relationship between them? The definition of trust? Yes. Definition of trust is meaning that whether the person sinking the machine is performed reliably and then they're giving that evaluation whether the machine performed well, non performing well. And then they have a survey capturing that. Then they also have a conversation trying to have the conversation say what machine asks, what do you think I do in the last round? And then people will answer center question, Does that address your question? Okay, Yeah, a quick question. If you reverse this, look at the way the AI is presenting information. Is there a way to promote trust with the more, the more the AI thinks the answer is correct. At story of the response, yeshize, yes, I'm really contents, that's great question. You're like giving a preview for my part three. So my part three is actually using what I found here and then designed the voice of AI, we actually found some possible results. So like pay attention, I keep talking for that. Okay, great. Yeah, really quickly was to measure conversation after the test was done. Yeah, after the text was done. Concurrent will be great. But we're not doing that because we happyeople to reflect on the experience I just had with that decision making process. And it's easier for data collection as a test bed for the future more real time ongoing conversation. Yeah. Yeah. Yeah, I would expect to see that. Yeah, definitely. I will cut the question here since I have like two parts. I also want to give people a look on how we change the voice of the robot. So let's leave the question to the very late Part two is how does the trust change over time? I think also relate to your question, that's on my point a little bit because we don't just like evaluate people's trust like from the moment people trust also change over time. Just like this talk, I couldn't just raise my tone directly to let you trust me, right? That's not reliable. We also need to see what are people actually talking? How does that change? Therefore, we're trying to capture and model that trust and make over time using a network approach. In the part when we measure the trust mainly based on the I were the conversation agents performance. In the part, we want to see how those people change over time as a process. Adding that temporal perspective in the dataset we just see or in the task carbon dioxide removal system tasks, the participants are saying something like, procedure didn't line up what I thought the right procedure would be. So that's reflecting some deliberative or analytic process people are having. And they're also saying something like I was worried about couple steps worry can reflect their effective emotional influences. And sometimes they combine their systems together, have both analytic and effective process. We're trying to understand how can we model this multidimensional process of people's thinking over time. So we use approach called epistemic network analysis. You can think about as a social network analysis for the social network, like every human are individual nodes and then relationship between the humans are the edges. For example, I'm very close to my cat, so the edge are very strong and I'm not close to the people, I just know. So the edge might very be weak. And then this is very resemble that approach. We're modeling all the topics people are talking in the conversation. All the topics are other nodes. And then how frequently do people talk about those topic are the edges. We're trying to model the strings of those topics using this approach. The pipeline will be like this. The data collection and segment will be similar to the part one study. We have 24 participants segment based on how each of turn off top people are talking. Then we use a combination of top down process and bottom up to come up with some topic people are talking about in the conversation. Those are reflect some analytic process. People are having meaning that what are people thinking, evaluating the performance. People are talking about whether the system performed very well, whether there are some errors. And then we also have the effective process which are the positive villains. Hi Rosal, such as con native villains. Hi Rosal, such as confused. We're also capturing people's emotional words in the conversation. Then I will exclude the methodology there and directly show the outcome of the network. This is the final look of the network. All the topics we identified in the past become one individual nodes over there. And all the edges representing how frequently, what's a co occurrence are those topics people talk together. Then we can reduce long conversation into two dimensional representing as a network And X axis representing the people's analytic processes. On the Y axis representing people's emotional influence, emotional talk on the Y axis. Then the blue when people are high interacting with high reliable agent. And then the red representing they're interacting with a low reliability agent, they're low trusting. You can see a clear and distinct pattern how people are talk differently when they're interacting with high versus low. We can also conduct some statistical analysis on this. We show that when people are interacting with two different type of performance agents, their talk are very different on the analytic x axis, meaning that when people are dealing with high agent, they're talking about how great the agent performing when they're interacting with slow agent. How poorly they're performing. And we can also see the connection between that linking with their emotional processes. This is convincing all the conversation like in a single dimension, like in a single network. We're also interesting to see how do people talk about this over time. As I promised earlier on, therefore, we use some trajectory analysis and trying to segment that network into a trajectory and see how does that change over time. We coded by the 12 decision making tasks people are having and sequentially connect the cynoid in the network and trying to see how does that change over time. By doing that, we show that here is a final trajectory result. It's a little complex with three figures. I'm going to break it down a little bit. On the top left, it's how people talking about effective process. You can see there are a lot of oscillation up and down, but it's still continuous. Meaning that when people are talking about their effective process, they're not changing a lot. They're just like gradually, there's some oscillation going on. However, to contrast that, when people talking about the performance of the agent, there's a sudden transition in terms of the topic they're talking about. Once they pick up the arrow, they were directly pointing out that, and therefore that's reflecting on the conversation they're having. And when we combine these two terms together and trying to see how do they talk about the two like dimension altogether. We can see here on the high reliability, their conversation are pretty converge. There are not much variance in the conversation they're talking about. However, when they're dealing with the low reliability conversational agent, conversation are more spread out, the variants are more larger. There can be two potential explanations for this. One is that when people are dealing with low reliability agent, there are behavior congit processing, their topic are jumping around. That's one explanation. The other explanation which I prefer more, there's a between subject variants, we can use a term. Trusting individuals are all alike, they're all talking in the same direction. But this trusting individual talking in its own way. Some people are focusing on this part, other people are focusing on that part. There are more scattered topic as representing this figure. To capture that result, we model the trajectory of how people talk in the analytic effective process in the human conversation. And we show some distinct analytic topic in the high versus low trust state. Then people tend to have more scattered topics in the low trust state. And then suggests that when we're designing the conversational agent for the high trust, we can only provide some minimal inputs to confirm the system performance because they're talking in a similar direction. However, when people are talking in the low trust state, we show more targeting towards the topic people are talking about and provide information for some explanation to either repair trust or manage trust. Any questions on this part? Now I can jump to the third part on managing trust. This is a preview we just gave. How can we design the agent's voice or how can we design agents different items to repair people's trust? In the part one, we try to measure people's trust based on the agent's performance. In the part two, we model people's trust change over time. In the part three, we try to manage people's trust if people trusting are too high. Can we reduce it? If people are losing their trust, can we repair that a little bit by changing agents like conversation? Then trust can be also divided into three different processes or three different bases. And part one we capture the performance, part two capture process. In part three, we're going to capture some purpose dimension. Meaning that when the AI, or the conversational agent holding different values or different goals, can we still manage people's trust? What do I mean by that? Is that essentially asking the question, will people trust and cooperate with socially optimal AI As AI increasing incorporate in the society with us in the future, we might encounter this conflict between humans and AI. Because human we have bound rationality. We're irrational in some way. But when you're designing the AI system, they're interconnected, they can to promote more public goods for the future. Take example as daily commune or daily driving from every day. When you drive from point A to point B, you may, we just want to get to your destination as quickly as possible. But on the other hand, when you're designing for the autonomous vehicle or AI system, they can reduce traffic flow, they can preserve some energy consumption. They might allow you to say, hey, why don't you try this route? To do that, there might be some conflict between the socially optimal AI and then the individual local optimal humans. So we want to understand not only how performance will influence people's trust, also the purpose of the AI, will that also influence people's trust. And how can we repair that and how can we like manage people's trust under different conditions? To do that, we use some game theoria approach, trying to design a test back to evaluate both performance based and purpose based trust violation. There are two types of game we borrow from the literature. One is a trust game. The humans and the agent with the other player might both starting with a certain amount of money. For example, $10 Then human as a trust, you can decide to give the others. For example, you want to give that $5 then the amount will be doubled. If you try to give that amount to the other player, then the other player also need to decide, do I actually want to repay you a certain amount by waiting how much you actually gave it to me? How much money the human player giving to the agents representing the trust here. However, in this game, there's no common goal. Meaning that there is no purpose dimension here. This mainly like one to one interaction. They're only have their own goal, Like their goal, just to maximize the final payout, final gain they're having from the game. Therefore, we introduce the other game called Public Goods game. We have multiple human player, multiple players in the game. And then they can all investing into a central bank. And then each individual can decide, oh, I want to invest $5 or $10 And if the total investment exceeding some certain threshold, for example 15 and all the money will be double, it's just more like a bank and give you some interest. If there are collective inputs from the population, then this can actually representing some common goals between all the players. However, there is no direct human interaction. They are not directly interacting with each other. Trying to capture both direct human air interaction and also the common goal perspective. Therefore, we designed a new game trying to incorporate this two game altogether to capture all the components. We are still base that in the space exploration context. In the first stage, they're doing a trust game. The humans are trying to allocating certain resources because we're in the space now, using money, allocating resources and trying to activate certain rover to do some space exploration. Then the agent AI agents can double that resources human player invested into the agent. The next step they are allocating to all the resources into a common pool. Here we're defining that as a rover. And then they can allocating that rover over certain runs. And then if the group rover pass certain threshold, the rover will be activated. Then the rover will start collecting some scientific information and give some returns back to the human player and the AI player. They're going to play this game for the 15 rounds. The game starting with the human deciding how much resources for money giving to the AI player. And then they observe whether the AI player actually double that amount the human gave to them. And then they need to decide how much they want to allocate to that team Rover. They can also choose to be selfish to give to their individual rover, which have a smaller payoff. However, the team rover will yield a bigger payoff. That's a trade off between the local optimal and the system optimal. Do you want to allocate to the team versus to yourself? Then finally, they need to predict what do you think the AI team will actually allocate to the team? Because that information is unknown for the human player. We want to use that as a dependent measure for the future study. And then they can review actually how much they allocate to the rovers. And what's your payoff here? Because we want to study a different type of violation can influence people's trust. We also want to see how can manage people's trust. After certain violation, the AI agent will say something, sorry, I did something wrong. So we design different type of utterances in this part, trying to manage people's trust. Throughout this game, we can see how does the performance based trust violation, meaning whether the AI actually doubled that power, double that amount of money the human player give it to them. And the purpose based trust violation, meaning that whether the AI is being a teammate on the same team, all allocating to the team Rover. We can study these two different types of trust violations and see how can we repair different type of T violation appropriately. Yes question. There's no material reward. There's no. So yeah, I exclude that part. So the final rewards, final information they're gathering, we will become the monetary rewards. For example, they gather 100 points by the end that will become additional $1 So there's some incentive for the human to actually liter dollars. Yeah, so it's like online study, we conduct online study, we record their final score. And then if they actually get a higher score, they will get more money. So there's some incentives for them to either decide to allocate to the team to get a larger reward, but that's also depending on the other players behavior. Or they can be selfish just to allocating to their own individual. That doesn't have to depending on your teammates, but you might get a smaller payoff. There's a trade off between whether you want to be cooperative or whether you want to be competitive. That's the main thing I'm studying here. Any other follow up questions? I know it can be a little. Okay, cool. So we want to study, if the AI didn't perform well, how can we repair it? Or if the, I didn't be a good teammates didn't allocate to the team, how can we repair it and whether those trust violation will cause the same detrimental effects on human trust. To resolve that, we designed a three by two by three study design team stage. Meaning that high, low, high. At the beginning they were all performing well, and then they're performing poorly as a low, then performing well again as a high. They can perform well in the low condition in either performance based violation or purpose based violation performance base. Just to recap, they didn't actually double that amount of money then for the purpose they didn't allocate to that team. And then we try to see how can we actually repair that if the I actually makes a mistake, how can we do that? We designed three different type of strategy trying to repair those. First is just a control condition, not saying anything particularly important. Let's say let's continue the test then. Second, they can do some apologize and provide some explanation saying that, I'm sorry my powering didn't work. This time my sensor needs some calibration for this round, giving some explanation on why does that happened. Then the third repair content is making some promise, saying that this won't happen again. We want to see whether all of this content can actually repair people's trust. All of this combination also based on, per literature and metaanalysis say which one actually more effective. We combine some combination of those. So to measure whether that's also effective. So we measure the subjective measurement using a multi dimensional measure of trust. Which capturing both people's perception of AIS performance, whether that's reliable and also purpose, whether AI is being a kind and good teammates. Also we use a behavior measurement from the game. We're measuring how much people are actually giving to the AI player. The more people are giving to the player, meaning that the more they trust the AI can double that power. Also, we're measuring how much people are being cooperative, how much people are allocating to the team, then their prediction of AI cooperation, how much they think the AI actually give to the team. We measure all of that. To recap, we're trying to capture relationship between the trust violation, trust repair content, and see how does that influence a subjective trust. And all the behavior measurement, we capture 180 participants from online Amazon Turk, and we do a full factorial mixed linear regression here, down below, what we found is quite interesting. We found that trust drop more when I actually make a purpose based violation. Meaning that when the AIT may not allocating to the team trying to betray the human player, humans trust actually drop more. It's consistent with what we thought, but it's really good that we can validate this. Because prior literature mainly focus on the performance part, not focus on the purpose part. Then, as we can see on this figure, this can give us a conclusion that the purpose actually always performance. When designing the agent, we should consider both the AI's purpose and also its performance. Looking at the subdimension of the measurement, we can see that when I perform not something, not well on the purpose dimension, meaning that I didn't allocate to the team. It influenced not only people's perception of AI, whether I being kind or not, it's also influenced the subdimension of the performance when I actually not being cooperative. Let's also influence people's perception of the AI capability, also its performance. It's really giving a really negative influence on people's perception of the AI overall. How can we repair this type of trust violation? We found that the explanation is actually most effective in repairing people's trust. When we look at the graph here, we show that when we do the no control strategy, like I say, let's continue the game, people's trust drop significantly. But when we having the explanation, people's trust doesn't drop that significantly. Meaning that trust is repair a little bit throughout that process to our surprise making promise, actually making people's trust drop further. Because people might predict the AIT may say something but then perform something bad again. We guess that's the reason why the promise actually didn't really repair people's trust. By looking on the explanation, looking at the subdimension of how people rate IT, performance and the purpose, we notice that it only repairs the performance. Subdimension means that the purpose based violation, not only diminished people both dimension also left an unrepairable effect on AIS intention. It's meaning that once you think the AI is not being a good team, it's really hard to repair even though you provide some explanation that can only repair people's perception on how good this IT can perform, but not on how this IT can cooperate. We really need to focus on this purpose dimension for the future design for the AI. And then that's only focus on the word IT is talking, but we're not controlling the tone of the voice ATMs are doing for this study two, we're interpreting the results from study one, people's trust in voice and then to map that to the voice of the AI agent and see whether that actually has the effect on people's performance. We're managing people using some trusting voice. In the study, one people are that trustor indicating whether they're trusting that IT. But here, since we're using that voice, we can think about AI agents actually being that trusted, indicating they're that human player. It's different from the previous literature on the trustworthy voice. It's more like actively saying that I'm trusting you in a trusting tone. By the results from the study one, we also want to see how acoustic que is being concurrent with the lexical que. Think about when I'm saying I'm happy, like using a very high raising loud tone that might be concurrent and you might be processing information much quickly. Then I'm saying I'm happy that might be very confusing. We want to see how does the trusting tone being concurrent with the trust repair context? They can be two directions. You can have the trusting tone signaling the trust and then that can enhance the trust repair. However, on the other hand, as we talked about in the part one trusting tone is indicating that parti balance hire Rosal. It can be similar to a smelling tone. Think about you saying I'm sorry, in a very smelling voice. People might feel offended as a wire doing that. It might incongruent with the trust repair context and can diminish the trust repair. We don't know which direction, this one, the tone of the voice and then the word AI saying actually go. We designed a study by study this contact voice concurrency and see how does the explanation I is delivering and then the tone of the voice AS having we adjust AS voice based on the relationship from the part one by adjusting AS tone on foreman FC and then the standard deviation of the fundamental frequency. We get something funny like this, I am sorry that my power optimization didn't work this time. My sensors need some calibration for this round, so that's a calibration on the high trusting voice. I am sorry that my power optimization didn't work this time. My sensors need some calibration for this round. So that's like a low trusting voice. That's like the calibration. We based on the funding from that and adjusting the voice, the AI is actually saying those words to the participant. Try to see how does influence people's perception and behavior in the game. This is a relationship we're investigating again. We have 120 participants for this. And also run full factorial mixed linear aggression. And we also found something quite interesting. We confirm the positive congruency, meaning that the high trusting tone voice, the first one you're listening, actually promote the trusting behavior. So people are actually investing more in the AI teammates when listening to that voice. After some repair, the relationship can look like this. They have 15 rounds of the game and throughout the five to ten rounds they're listening to that trust repair contents peer with that voice. And then the high trusting tone voice showing a more significant increase in the investment people are having towards the AT the high trusting voice. Here confirmed the hypothesis we're early having is signaling that AI is trusting the human player. And then reinforce the trust repair contents. And therefore that increased investment in the game. To recap this part, we found that people trust actually drop more when IT may fail to cooperate. But high trusting tone can actually pair with explanation can repair that. We show that purpose based trust valuation is really important in the future. When we design the IT, we should consider the purpose based aligned IT made from the lexical perspective, text perspective. We found that explanation is really important from the acoustic perspective. We found that high trusting voice actually promotes the trusting behavior and supports a positive congruency. Then to overall capture all I'm talking about today, I show that trust can not only be measured from the voice and using what we measure, we can also use that to manage people's trust in the conversation. This also suggests some ethical consideration for the future conversational design, AI design, because simply by changing the tone of the voice, we can change people's investment behavior. There is a really soft boundary on what should we monitor, what should we do, More investigation for the ethical practice for the future. That's all. For the past project and for the future I'm interested in to consider more people interaction in the past. We study the single human I interaction in the future. We're trying to understand when two humans are interacting with one robot together. For example, your human player doesn't really trust your robot. Well, that has a contingent effect whether you'll be influenced by your human player and then change your interaction with a human player I player as well. Also in the past, we study the verbal interaction. In the future, we're interested to study also the non verbal interaction. Because communication is not only what we say, it's also the gesture we're doing. The facial expression and then where are we looking at for the future? I'm going to incorporate all of this for the human bile interaction in the lab. Thank you so much for listening. Now I will open some floor for the question or comment. Thank you so much. Yes. Do you know off the top of your head what was the highest dollar value that any of your investments possibly earn? I asked with designs of this, I'm always skeptical that enough potential earnings injected into the situation to create the risky context that could even foster a ecologically balanced suppress at all. I mean the things people do, I mean, I can imagine if somebody said to me you can earn $5 and see how GPT investment money, I might just like dump it in there to see because I'm curious and it's not even really. Yeah, yeah, yeah. Yeah. When we design that, we calculate all the payoff trade off. I think for the base, every participant can get at least $5 for additional dollars. You can, can get additional $7 just based on certain minutes of interaction. People do have that incentive. If they perform well, they will have much more than the actual base they're having from the interaction with. But do you think that's enough? Like, I'm just wondering if it's like, you know what I mean? Like uh huh, incentive to worry about anybody. Yeah. People just want to watch the world very. Yeah, that's true. I think that's true. I think the risk, as you mentioned, there are different dimensions, right? So I'm definitely only typing in one dimension of the risk, which is only financial parts. I think what you're talking about can be more physical visceral risk people are actually experiencing. I don't think what we're designing here from an online experiment can actually study that. For that we might need to do some driving. Like people actually to drive on the road. If I didn't perform well, you're actually bump into the side. And that can give people more like real like interaction. And give people a sense of the risk. Therefore they can actually trust the human agent. Yes. So for the sort of past study you study, of the pass study sort of human human thought. And I guess for your man robot that is there a difference you find and what do you say? Yeah, definitely. That's a great point. I think that's a question I get every single time for like the comparison between human, human trust and human robot trust or human I trust. I think there are a lot of similarity, like the trust and automation trust. And I draw the literature from the interpersonal trust. But the main difference I would say is that in the past 20 year decades of research, human I trust or human automation trust. Always focus on performance and process. So those are the three dimensions we're talking about. Always focus on whether the AI is performing well, whether that's transparent, whether the human can understand that the process, whether that's clear to the human. But not much work has already done on the purpose trust. Think about interpersonal trust. We're thinking about the intent of the other player, other humans, and what's your value, whether we share the moral value. But I don't see much work on the value line of the trust work in human relationship. That's why I'm introducing my third part of work, trying to understand that from a game theory perspective so we can start to understand how do human consider the moral aspect of the goal sharing the goal conflict with AI. Yeah, I think that's a great question. Yes, yeah, definitely. I think that's also a good question. So the question is whether do you see the trust influence not only human's perception, also the humans behavior, Right? Is that okay, Great. I think that's captured in multiple parts of the talk. In the first part, not only change people's trust rating, also change how people are saying some words. So that's part of behavior. And then in the third part, because we're doing this investment game, we can see that it's changing how people actually invest their resources, invest their money into the human. So I definitely think there's a direct relationship between your perception and your behavior. So for trust we call it trust, and then the behavioral one we call it reliance. How much do people actually rely on the agent if they gave a decision, gave a recommendation, whether you actually accept that and rely on that. There's a direct mapping between that. Yeah, we're going to draw the questions to a conclusion so that we can thank Professor Lee and then she'll be around for questions if you have some follow up. And of course, reach out to her here at Georgia Tech if you're interested in any of these topics and working on these projects. So thank you one more time. Sorry, I just noticed I still have the Wisconsin e mail, so change to go Tech.