Unit testing AI apps

Published at: Apr 01 2024
Updated at: Apr 01 2024
Reading time: 1min

How do you evaluate your software's doing what it's supposed to do?

Do you test all your app's possible cases, branches and states? I don't, at least not manually. Nobody aint time to manually click through all the edge cases. QA'ing a simple login form takes time, let alone testing complex applications.

Having robots do that helps a ton, and I recommend writing automated tests to help you sleep well at night (and release fewer bugs)!

Ignoring the burden of writing and maintaining tests, testing a "normal" web application is straightforward because it's predictable. Throw something at your app and expect a result. It should always do the same. Most apps are CRUD apps anyway — easy peasy.

But what if there are unpredictable parts in your app's core?

If you're riding the AI buzzword wave, you probably implemented an "I know everything" smart-ass right in your app's core that's known for lying and spreading fake news. (Yes, I mean some sort of LLM.)

How would you test your app's quality if you're building software on top of software you probably don't understand?

Here's Hamel Husain's recommendation:

There are three levels of evaluation to consider:
Level 1: Unit Tests
Level 2: Model & Human Eval (this includes debugging)
Level 3: A/B testing

I'm not planning to get into serious AI work or LLM programming anytime soon, but unit testing software sitting on top of LLMs is fascinating and worth more than a bookmark!

If you enjoyed this article...

Join 6k readers and learn something new every week with Web Weekly.

Reply to this post and share your thoughts via good old email.

Stefan standing in the park in front of a green background

About Stefan Judis

Frontend nerd with over ten years of experience, freelance dev, "Today I Learned" blogger, conference speaker, and Open Source maintainer.

Unit testing AI apps

About Stefan Judis

Related Topics

Related Articles