More on Feedbacks

Introduction

Feedback is the concept introduced by Arfniia Router to instruct the learning and optimization process, enabling the router to refine its decision-making through user-provided feedback after interactions.

Learning Architecture

The Arfniia Router’s learning architecture incorporates feedback into its core decision-making process powered by reinforcement learning. A simplified overview of the architecture looks like:

+----------------+       +----------------------+
|   User Prompt  |       |      Feedback        |
|     (Input)    |       |  (Input for Reward)  |
+----------------+       +----------------------+
          |                          |
          v                          v
  +--------------------------------------------+
  |           Neural Network (Policy)          |
  |     (Learns from Prompt and Feedback)      |
  +--------------------------------------------+
                       |
                       v
           +------------------------+
           |      Exploitation      |
           +------------------------+
                       |
                       v
           +------------------------+
           |       Exploration      |
           +------------------------+
                       |
                       v
           +------------------------+
           |      Selected LLM      |
           +------------------------+

In this architecture, feedback plays a crucial role by influencing the neural network’s policy, helping to refine future decisions. The system balances between exploitation (choosing the best-known LLM) and exploration (trying new or alternative LLMs) to optimize performance.

Types of Feedbacks

To accelerate learning and optimize performance, feedback can be provided in two key forms:

Sparse / Delayed Feedback

This type of feedback is generally provided at intervals after multiple interactions, representing a more aggregated evaluation of performance over time.

API Endpoint

/v1/feedbacks/{router_name}/sparse/{feedback_value}

Use Case

Suitable for applications where immediate feedback is not feasible, but periodic evaluations are possible. For example, customer service agents may receive a final resolved or not feedback at session end, or advertising campaigns may receive click-through rate updates with occasional delays.

Immediate Feedback

This type of feedback is provided right after a response is generated, allowing the router to update its learning more rapidly. Immediate feedback enables the system to adjust its neural network policy in real time, providing the fastest possible learning experience.

API Endpoint

/v1/feedbacks/{router_name}/{response_id}/{feedback_value}

Use Case

Suitable for scenarios where immediate feedback can be provided, either user-supplied like coding assistant (accept, follow-up, reject the generated code), or system generated like an evaluation dataset (e.g. golden answer for each question)

Best Practices for Using Feedback

Always use sparse feedback, this can lead to better learning outcomes in long-term.
Use immediate feedback wherever possible, to allow the router to fine-tune its model.

By leveraging the feedbacks API effectively, users can maximize the performance of Arfniia Router while adhering to compliance of the use of LLMs.