Can We Trust Generative AI?

Apr 20, 2023

I’ve always heard “trust but verify.” You can thank the KGB for this little gem, and I doubt those guys trusted anyone. I’ve always thought that the more accurate version was “trust BY verifying,” but I’ve never used sodium thiopental on anyone, so what do I know?

It seems as if the world is on fire about generative AI and large language models (LLM). And there are some interesting issues bubbling up to the surface. We all know the best lies are the ones mixed into a lot of truth. And those AIs do this really well, telling lies (or at least inaccuracies) with great panache and authority, all nestled among a barrage of competent prose — or software. It’s easy to dismiss the computer when it insists it’s 2022 and you KNOW it’s 2023. Silly silicon — until it starts swatting you. But what if you want to rely on these LLMs to build software?

We know a little bit about trusting software. At its core, Skyramp is about giving developers and other CI participants increasing confidence that the code that’s flowing into Kubernetes clusters will work once it reaches its ultimate destination. We do this by making it easier than ever before to introduce, share, and automate mocking and testing earlier and earlier into the software development process. Confidence comes from trust. And trust comes from both the sensitivity and specificity of testing cloud native applications. Sensitivity is about catching as much of the stuff that breaks things as possible. Specificity is about providing meaningful results once you catch them. You don’t have to be perfect to create trust and then confidence, but you have to do a lot better than doing nothing at all.

Which brings us to the original topic. More and more devs are using LLM to generate code. Skyramp is in the business of validating code, building developer trust and confidence. Is there an angle for Skyramp to do this for LLM output? And in a hat tip to Descartes, is there a world where LLM helps build Skyramp which helps validate LLM output?

Wouldn’t that be cool?