Mukunda Johnson – Software Engineer

Large Language Models and Humans

November 3, 2024

What kind of executive doesn't want to leverage the latest technology to cut operational costs and drive their company into the future lane?

I've spent a bit of time over the last year considering what is and what is not great about LLMs. What is great is assistance - letting the model help you find something, where the input is arbitrary. That's a no-brainer, given the massive popularity of ChatGPT for solving problems or getting details about something.

What isn't great is automation, where the model works by itself. It's a tricky domain, and I'm sure there are decent use cases, but such use cases are sparse. Often people aren't upfront about the risks (especially vendors).

Here's an example, security questions. Users can't always remember what exact format they've set their security question answer to. For example, "What was your first job?" and they entered the full job title and company. They can't remember if they just put the job title, the company, or both.

So, your junior engineer gets an idea to use an LLM to fuzzy-match the input to see if it's correct. Example prompt:

You are a security question checker. Compare the
RealAnswer and Input to see if it is correct for the
given Question. The Input doesn't need to match the
RealAnswer exactly, but the concept/answer should
essentially be the same (fuzzy-match). Respond with JSON
output with a single boolean field "matches".

<Question>What was your first job?</Question> 
<RealAnswer>Accountant at FirstFinancial</RealAnswer> 
<Input>Accountant</Input>

So in this case, if they enter "accountant" or "firstfinancial", it will return true. Great, the engineer has made the process more intelligent and intuitive!

What non-engineers might not consider here is all of the caveats. Firstly, you might encounter input that breaks the model, where the LLM doesn't even return proper JSON. In that case, you want to fall back to traditional fuzzy-matching tactics, e.g., how many words match, ignore letter case, etc. (I prefer JSON output, because it's easy to see if a valid JSON object is present, and then to parse the expected field.)

Worse, you've added a security hole. An important principle with LLMs is that you have complete access to any possible answer or system connected to the model. Prompt injection. If the input is unrestricted, you could extend the prompt. Even if the input is limited to say, 20 characters, you can probably still hack it by entering a special string of characters to break the model. In this case, you'd have a special string that always makes it return "true", no matter the question or answer.

Essentially, it's a bad idea all around. This is just a simple case, but it has many parallels in real software design. The way I sum it up is simple:

Functions return expected output for expected input. return expected output for expected input. return expected output for expected input. return expected output for expected input.
LLMs return unpredictable output.

That isn't to say LLMs are useless. They have their uses, but the fundamental principles behind them must be respected.

When you add a human behind LLMs, the value becomes much more apparent. Treat it like a horse; without a rider, it could go anywhere. With a rider, you can go fast towards a point. With an expert rider, you can go very fast.

So here's a strong use case: Support. It's easy to explain why this use case is so strong.

You have a human behind the LLM. It's the customer.
They are using the model the same way the average user is getting help from ChatGPT for arbitrary information.

It's no question that some people just want to "talk to a human" just because they don't want to dig through your knowledgebase.

Example, when you're at the store, do you want to look up the store website to search for an item to find the right aisle? Or do you want to ask the nearest employee? Most people are conditioned towards the latter. It's just intuitive.

That's a support case. Now imagine if there were devices around the store where you could push a button and ask where something is--this is very possible with LLMs. Customers should appreciate the fast help. If the AI fails or the customer otherwise gets an incorrect answer (which shouldn't be too common) they can fall back to an employee. It's not the end of the world. It's not something they can break or take advantage of. They should already have the expectation that the "computer" might be wrong.

We see a lot of use cases already with support sites, even before LLMs were popular, to try and guide the customer towards self-help. LLMs make self-help all the more accessible. ChatGPT is basically a support site. The front page has the header, "What can I help with?" It's fine-tuned to act like a support agent, just without a specific domain.

The key here is that you're mixing LLMs with a human, and the human doesn't need to be your employee. So long as you keep the principles behind LLMs and prompts in mind, you can make some neat systems that your customers will really like.

<< Blog Index