How can we use LLMs for complex tasks—reliably?
Using LLMs for complex, multi-step problems is challenging. Errors add up at each step and the models can "reason" differently each time. This can make them unpredictable for advanced tasks.
The fix? Combine rule-based systems with smaller, more accurate models that work with LLMs and enforce guardrails. The result: better accuracy, control, and transparency that builds trust in your systems and your product.
Here's a real-world example: That Works is used for a variety of use cases and one of which is where engineering leads automatically generate pre-reads for their meetings. Blindly getting LLMs to summarize information from various APIs is insufficient and noisy. People care about specific things: What work is at risk and why? What code have we shipped this week and how long did it take? What tasks are blocked and what's their impact?
By using specific models and algorithms to define concepts that map to people's expectations, our LLMs become significantly more effective. They ensure consistent, reliable, and meaningful information every time. This approach also makes it much easier to detect and debug issues, rather than struggling with the black box of prompt engineering.
We're tackling some pretty novel problems and solutions while we build out That Works. I'll continue to share more every week!