Customer Service Agents

Background

AI powered customer service requires contextual intelligence at scale. This guide explains how to use Arfniia Router to dynamically select the most appropriate LLM for each customer interaction, leveraging reinforcement learning to continuously improve customer experience.

Why Dynamic LLM Routing Matters

Dynamic LLM routing can help AI powered customer service agents achieves the followings:

Efficiency

FAQs and simple clarification questions can be handled by cost-effective models, delivering quick responses while minimizing operational costs without compromising service quality.

Accuracy

Complex troubleshooting or technical issues can be routed to models with advanced reasoning capabilities or models fine-tuned on specific domains, ensuring accurate, relevant responses tailored to each query’s specific needs.

Personalization

Customer preferences for AI support vary significantly based on their demographic, support tier, and urgency level. Effective LLM routing to appropriate language models helps deliver personalized service that matches each customer’s expectations.

Learning-Enhanced Customer Service

Customer Service LLM routing comes with unique delayed-feedback challenges, as success metrics often arrive only after complete sessions rather than immediate interactions. We address this through structured reward shaping and credit assignment algorithms that optimize routing decisions based on eventual customer satisfaction signals.

Reward Shaping

Reward Shaping enhances learning efficiency by providing intermediate feedback signals throughout the customer interactions, instead of only providing feedback at the task’s end.

In the context of customer service, this can mean assigning small rewards for each message exchanged, reflecting the idea that as long as the customer is engaged and continuing the conversation, there’s some positive value in maintaining the interaction.

As an example, each message could receive a small positive feedback (e.g., +0.1), while larger feedback (e.g., +1 or -1) are given based on the final session outcome (resolved or unresolved).

Credit Assignment

Credit Assignment distributes end-of-session feedback across the interaction chain to identify which decisions contributed to the outcome.

As an example, we receive binary feedback at the end of a session, where the user indicates if their issue was resolved. In order to refine the router’s behavior, this final feedback can be distributed across the entire session, applying it to each message in the conversation. This allows the system to learn which actions contributed positively or negatively to the final outcome.

Implementation Guide

We use 2 tabs as a demo for customer service agent powered by LLM routing

Event Loop, handle user events
CustomerServiceAgent, handle messages and feedbacks

Event Loop
CustomerServiceAgent

# NOTE: defined in another tab
agent = CustomerServiceAgent("cs-agent")

session_active = True
while session_active:
    user_message = get_user_message()
    resp = agent.reply(user_message)

    # check for session end
    session_active = check_if_session_active()

# get user feedback, True if resolved, False if not
resolved = get_user_feedback()
agent.ack(resolved)

from openai import OpenAI
import requests

base_url = "http://ec2-ip-address:5525/v1"

class CustomerServiceAgent:
    def __init__(self, router_name):
        self.router_name = router_name
        self.client = OpenAI(api_key="anything", base_url=base_url)
        self.responses = []
        self.accumulated_feedback = 0
        self.feedbacks_api = f"{base_url}/feedbacks/{self.router_name}"

    def reply(self, msg):
        resp = self.client.chat.completions.create(
            messages=[
                {
                    "role": "user",
                    "content": msg,
                }
            ],
            model=self.router_name,
        )
        self.responses.add(resp.id)
        self.accumulated_feedback += 0.1
        requests.put(f"{self.feedbacks_api}/sparse/{self.accumulated_feedback}")
        return resp

    def ack(self, resolved):
        final_reward = 1 if resolved else -1
        # distribute final reward equally across all responses
        num_responses = len(responses)
        reward_per_message = final_reward / len(responses)

        # assign final reward to each message
        for resp_id in responses:
            requests.put(f"{self.feedbacks_api}/{resp_id}/{reward_per_message}")

Key Takeaways

Arfniia Router leverages advanced reinforcement learning to dynamically match interactions with optimal LLMs, delivering powerful results across three key dimensions:

Efficiency: Optimized LLM selection lowers operational costs.
Accuracy: Context-aware routing reduces average ticket resolution time.
Personalization: Tailored responses matches customer preferences.

The system continuously improves through reward shaping and credit assignment, creating a feedback loop that refines routing decisions and enhances overall service quality.