Coding and Writing Assistants
Background
Creating effective AI coding and writing assistants goes beyond just using a powerful model; it demands contextual intelligence that adapts to diverse tasks and user preferences. Arfniia Router enables dynamic LLM selection and continuous learning from user feedback, empowering AI assistants to genuinely understand and evolve with users’ needs.
The Case for Multiple LLMs
Prompt Compatibility
More and more organizations are successfully migrating across state-of-the-art LLMs without needing prompt rewrites. This ability to switch between models provides flexibility and future-proofs applications, ensuring they seamlessly adapt to model updates and improvements.
Training Data Differentiation
Leading LLM providers have established unique, differentiated access to data across pre-training, alignment, and reasoning stages. These distinct data advantages mean that models now excel in different domains, signaling the end of the single-model era.
Inference-Time Intelligence
Inference-time intelligence, as seen with models like OpenAI’s o1, points to a future where reasoning capabilities adapt dynamically to varying timing budgets, amplifying the variability of LLM outputs. Leveraging these advanced capabilities across multiple providers opens new frontiers in dynamic and context-aware AI applications, in the evolving landscape of LLM performance.
Context-Aware AI Assistant
Instead of manually switching between LLMs through trial and error, Arfniia Router creates an adaptive system that learns instantly from user feedback. Coding and writing assistants often run as multi‑turn interactive sessions, so we use episodic RL to stitch learning across steps while still reacting to immediate signals such as:
- Accepts suggestions
- Rejects suggestions or requests for changes
- Provides follow-up prompts or clarifications
This immediate feedback loop, combined with episodic credit assignment, allows the router to swiftly learn user preferences and fine‑tune model selection for specific tasks, whether it’s generating unit tests, developing new features, or crafting marketing content.
Implementation Guide
A demo coding assistant that learns from the following user feedback signals within an interactive session (episode):
- Individual feedback for each response
accept
, reward is+1.0
reject
, reward is-0.5
refine
, reward is+0.2
- A moving average of the above rewards for accumulated feedback
The assistant marks session boundaries using HTTP headers so the router can learn across steps within the same session:
X-Arfniia-Episode-Id
: unique id per sessionX-Arfniia-Episode-Start
: present on the first stepX-Arfniia-Episode-End
: present on the last step
# NOTE: defined in another tabassistant = CodingAssistant("coding-router")session = assistant.start_session()
while True: user_input = get_user_input() if not user_input: break # end session
suggestion = assistant.generate(user_input, session=session) feedback = get_user_feedback() # 'accept', 'reject', 'refine'
if feedback == 'accept': assistant.process_feedback(suggestion.id, 1.0) elif feedback == 'reject': assistant.process_feedback(suggestion.id, -0.5) elif feedback == 'refine': assistant.process_feedback(suggestion.id, 0.2)
assistant.end_session(session=session)
from openai import OpenAIimport requestsimport uuid
base_url = "http://ec2-ip-address:5525/v1"
class CodingAssistant: def __init__(self, router_name): self.router_name = router_name self.client = OpenAI(api_key="anything", base_url=base_url) self.feedbacks_api = f"{base_url}/feedbacks/{self.router_name}" self.total_reward = 0 # accumulated reward self.num_interactions = 0 # count of feedbacks
def start_session(self): return {"id": str(uuid.uuid4()), "started": False, "ended": False}
def end_session(self, session): session["ended"] = True # Optionally send a no-op final call to flush terminal learning, # but usually the last generate() marks end via header. return session
def _episode_headers(self, session, is_last=False): headers = {"X-Arfniia-Episode-Id": session["id"]} if not session["started"]: headers["X-Arfniia-Episode-Start"] = "1" session["started"] = True if is_last or session.get("ended"): headers["X-Arfniia-Episode-End"] = "1" return headers
def generate(self, prompt, session=None, is_last=False): # generate response using router headers = self._episode_headers(session, is_last) if session else {} resp = self.client.chat.completions.create( messages=[{ "role": "user", "content": prompt }], model=self.router_name, extra_headers=headers, ) return resp
def process_feedback(self, suggestion_id, reward): self.total_reward += reward self.num_interactions += 1
# calculate moving average reward moving_average_reward = self.total_reward / self.num_interactions
requests.put(f"{self.feedbacks_api}/{suggestion_id}/{reward}") requests.put(f"{self.feedbacks_api}/sparse/{moving_average_reward}")
Tuning Exploration
Arfniia supports configurable exploration levels per router (set via training.exploration_level
):
low
(default): favors the best‑known model for stabilitymedium
: balances trying alternatives with sticking to the current besthigh
: explores aggressively (useful for demos, smoke tests, and rapid adaptation)
Use low
in production by default, and temporarily increase during trials or when onboarding new tasks to speed up adaptation.
Key Takeaways
Combining multiple LLMs with Arfniia Router’s episodic RL can boost the best possible performance for coding and writing assistants.
- Contextual Intelligence: Dynamically picks the optimal LLM per turn and across a session
- Cost Efficiency: Leverages smaller LLMs when advanced reasoning isn’t needed
- Seamless Experience: Delivers a cohesive multi‑turn experience with consistent improvement