Yesterday, I was asked to give a short talk on an insight gleaned while building for a consumer AI product. I immediately knew I wanted to share how we’ve had to learn and experiment with building a trusted product, when we have the contradictory goals of “increasingly invisible UI” (eg. SMS or voice) but even more complex/nuanced workflows with higher stakes (pickup/dropoff schedules, figuring out school lunches or conflicts for the week).
I had a couple of people ask to share it, so while it wasn’t recorded, I’ve written up my notes and quickly re-recorded a version. Thinking is still evolving, but I think a key hurdles before seeing widespread Consumer AI products is being able to successfully navigate this trust question.
(last note 😅 : while the title of the talk was the “UI of trust” (because I think of where AI products interface with our lives especially with increasingly agentic services) it really covers the UX or the full user experience/promise of the product.)
I'm going to be talking about building for trust. It's a little more salient for those who are building agentic experiences but I think it applies to all products because trust is at the heart of any relationship between a user and a product.
So take a look that this set of considerations and think on where your product falls in the continuum of each. Do you work with your user on a browser interface that they're actively engaged in the whole time? Or do you have something that runs in the background and collaboratively volleys interactions back and forth? We’ll come back to this at the end.
I build for parents. The kind of help I need in my life - things like childcare when my nanny calls in sick, or help dealing with the firehose of information and logistics to keep a modern family running. That just means that I'm building for the things that I need in my life.
And ultimately, that means I’m building for trust. Things like privacy and security are definitely table-stakes, but for the purposes of this conversation, I mean: trust you can count on it to be there when you need it, and trust for it to do the thing you need it to.
Specifically I've been building for this.
This is just a sample of the kind of information I get hit with on a daily basis. My girls are 10 and 13 so for the better part of a decade, I and millions of parents have had to handle this. Read it, process it and action it. Oh and then remember and tell the other people about it, when it's relevant.
Let's take the birthday party invite.
For the majority of families, what happens is they get an invite by email or text message. Then they need to get it into the family calendar, manually checking for conflicts, remembering to buy a birthday present and mentally rearranging the shape of that whole weekend with this new event.
With software, we can get a bit better. We can use OCR and write a program to extract the relevant info and create the calendar event, within some narrow parameters. But it doesn’t understand that this is a "birthday party" and can’t do downstream actions without a lot of brittle if/thens.
But with LLMs, this changes dramatically. Not only can we build a system that understands the semantic meaning of an American child’s birthday party, it can do the reasoning we need - check for conflicts with the soccer game (not the gardeners coming by), remind me to buy a bday present (and oh btw we like to give books and here’s a couple of great books for a 7yo). It can also help me think through rearranging the weekend ("Let's do the farmers market on Sunday instead - there's one in Menlo Park instead of the one in Palo Alto).
AI has been the missing piece of a puzzle I've spent years trying to solve. Finally, we can build not just a tool, but the actual collaborator that I need to actually be helpful.
More than an assistant that books dentist appointments or researches vacations, I need a forcefield I can trust to handle the logistics firehose and make space for me to show up and do the parenting things. Something I can just throw things to and know it’ll do what I need it to. On the go, at bedtime, heading into a meeting.
It's easy enough to say it and even easier to imagine it, but what does it mean to build it?
Before we can start building to take on the right actions for those dentist appointments or vacations, we need complete, accurate data and nuanced context first.
There's just one, massive wrinkle: before we get to the challenges of putting good context to work, we have our work cut out for us with the data. We have to start with a really good, fairly complete, data representation of the family, in digital form.
That doesn’t really exist. Most of it lies in IRL or as unstructured information - in our minds, on pdfs or in emails.
Let's take some examples. This first event seems pretty straightforward - has all the info, clearly formatted. This next one is a bit more complex - we need to make sure we have 5 events with right info to remember, so my kids doesn’t show up wearing pajamas on Retro day. And with the last one, we're starting to strain OCR and even current vision models but still doable.
What about this baseball schedule or now, we're getting to the heart of the matter - this school calendar or this kindergarten staggered start date doc - where families with odd home address have one schedule, and those with even have an other.
This is what parents across the country are dealing with. And if we're going to build them a forcefield to help, it's kind of all or nothing.
I can't hold some info in my head and other not. Or more importantly, my AI can't actually be helpful if it only ever has 50% of the info - and the easy half at that. But it’s also the reality that our existing AI models and capabilities just can’t get all of that right (yet).
So how do we build trust with a product that does what it needs to, has a reasonable failure path and communicates the the difference in a way that feels actionable, not frustrating.
Oh and wait, we’re running into a growing tension: for Milo to get as much of the info as possible, we need to lower the friction of reflexively sending it over. That means SMS, voice, emails - increasingly invisible, mindless and habitual paths (pay less attention).
But the work Milo takes on gets more complex and personal, with stakes getting higher (pay more attention to make sure it’s being done right).
So the question is: how do we make it increasingly easier for users to share the things they're being hit with, but in a way that makes it really clear what's been done and what they can do about it.
Well, I think about 4 things, from the POV of a user:
Transparency - what are you going to do or what happened?
Control - what can I do? How can I intervene? What are the points of check-in?
Consistency - Can I expect the same thing each time so that it’s easy for me to see if it’s right and so that I can relax checking over time?
Bounds - What can and can’t you do, and how do you teach me that over time, not just at "onboarding"?
What does this matter? Well, as one of my users said early on: You need to understand: I'm the only thing standing between a week that mostly works and total chaos.
If I'm handing that off to Milo, it had better do it right. I take the responsibility seriously, also because that's how I think about my role with my family.
So here are some ways we've thought about it with Milo.
First, for any given input, have a really standard, transparent output over SMS - because this is where 90% of our users use Milo.
For this reminder, there is a clear output of what kind of item, how many and the details added to it. If for some reason the user wants to dig further or edit something, we've given them a link. Then can go in and see exactly what's been sent in and what's been done with it. If they want to edit it, they can.
Next, consistency is under-appreciated but it's critical if we're going to really build trust that doesn’t chip away over time with minute frictions.
Take this rundown. It's pretty good at telling me what's happening and for who. But there are inconsistencies in how the kid in question is communicated - sometimes in parentheses and sometimes in the title. This is a shockingly hard problem to get right 100% of the time with current models and software rules, even though it seems like it should be trivial. And it sounds like a silly, small thing to get get hung up on, but these things add up (and with AI, they actually multiply up).
And lastly, we’ve built in deterministic ways for users to interact with Milo using “@ commands” - little, knowable ways for them to understand what they can and can’t do, starting with “@milo”.
To close, I come back to: What kind of product are you building and therefore what kind of experience are you going to need to design to give users transparency, control, consistency and clear bounds?
Today, most products tend to be on the left - chatbots or code editors; creative writing or image generation - things that have lots of UI and margin of error with single turn requests and responses.
What we're trying to do lies on the other end - a forcefield that runs in the background, with high stakes information and minimal UI, meant for actions on the users behalf in a collaborative, back and forth manner.
So our focus has been:
How do you take 100% unstructured data and get it to be consistent, accurate, predictable structured output?
How do we communicate clearly and simply what happened, and make it easy for the user to intervene and change?
How do we fix issues quickly with reasonable expectations of what Milo can and can’t do?
A last note then - what does graceful failure look like? I think true trust is built when things don’t go the way they need to. So how can we own it, fix it and communicate it? Here is just one example where even ChatGPT hasn’t really figured it out - promising things it can't do and doubling down when called out. But then, maybe with a chat bot - “use at your own risk” is acceptable.
But it’s also undeniable that it's burning through trust in very real ways with regular consumers. So each of us has a choice on where we want to burden of failure to lie.
For Milo, we choose to be conservative - if something is off (at either the model or software level), we let the users know and we've build a human review system in to fix and communicate it. Because that's the only way I'm okay with trusting Milo with my family's details.
As the technology gets more sophisticated, I'm certain that we'll get even more nuanced ways to catch and communicate misses.
But the point is: how we design for trust needs to be thought about and built in from the very beginning. Users feel the impacts of these choices subconsciously at the tiniest level - at the level of the formatting of a single line of an SMS.
AI is opening up incredible ways to solve problems and I’m so excited to see what everyone here will have a hand in bringing to life.
But what doesn’t change is that at the end of the day, each of us is ultimately building and selling trust. And by being intentional and thoughtful with the trade-offs we can create delight both with incredible features but also dependable reliability.
Share this post