Nov 11, 2024

Service-as-Software? Hybrid billing for Agentic AI

Pricing emerging Agentic Applications isn't straightforward. The typical count API calls and multiply by cost formula fails to put a price tag on a user's outcomes.

At Squad AI, we hit this problem early. Our network of LLM Agents processes your business data (Context, OKRs, and customer feedback) to generate product insights, but the relationship between input and output is highly variable. For one input, the same call to a foundation model might generate a single critical insight, whereas another may spawn a dozen interconnected nodes on our strategy canvas. Traditional pay-for-access SaaS billing models don’t account for this.

We started with seat-based pricing and fair-usage limits for pro licenses. We’ve introduced a node cap for the free tier. But it's not perfect, and we're iterating towards a pay-per-outcome model. As with all software, it's a work in progress. Here’s what broke along the way, and what we learned about billing for AI-generated outcomes.

Our technical implementation is built around a graph-based persistence layer, which allows us to model the opportunity-solution tree that our strategy canvas visualises. Each node represents a different strategy element; goals, opportunities, solutions, requirements, and the connections our agents discover. This graph structure allows us to track not only the data points themselves but also how they relate to and influence each other.

Our data ingress pipeline handles feedback ingestion asynchronously, enabling us to scale with customer data volume. However, this created an interesting challenge with usage tracking. Our initial implementation naively reported to Stripe's usage API for every node created in the graph. While this worked fine in testing, we hit rate limits as soon as we deployed to production, where many users were active in their workspaces. We solved this by implementing an event sink for node creation events, with a separate background process that batches and reports usage to Stripe in chunks. This approach decoupled our usage reporting from the main insight generation pipeline, preventing any impact on user experience while staying comfortably within Stripe's rate limits. It has the nice side effect of introducing resilience against Stripe downtime or other infrastructure wobbles, since failures can retry later.

Squad AI’s journey with usage-based billing reflects a broader shift in AI-native applications. As Sequoia's recent analysis points out, we're moving from "Software-as-a-Service" to "Service-as-Software," where companies sell work outcomes rather than seats. However, the technical reality is messy. While measuring API calls is too simplistic, purely outcome-based billing for AI workloads brings its own engineering challenges.

Looking ahead, we're exploring ways to price based on the actual value our agents generate. This could mean weighing nodes differently based on their impact or finding new ways to measure how insights lead to successful product decisions. By integrating analytics data - tracking which insights get implemented, which requirements make it to production, and their impact on key metrics - we could build a more sophisticated value-based billing model. The infrastructure we've built - our graph database, event pipeline, and usage tracking system - gives us the foundation to experiment with these more nuanced approaches. Like many in the AI space, we're still figuring it out. But one thing's clear: the traditional SaaS playbook won't cut it.