Skip to content

Search is only available in production builds. Try building and previewing the site to test it out locally.

Getting Started

Launch the AMI on AWS

Please drop us a message or reach out to connect@arfniia.com to request the free trial access and get started with launching the AMI.

We recommend using the 4th generation of Intel Xeon CPU instances, such as m7i.2xlarge or higher, to ensure optimal efficiency. Arfniia Router is specifically optimized for these architectures, offering up to 2x better performance per routing step with default configurations, compared to 3rd generation instances.

Configure IAM

Arfniia Router exclusively uses IAM roles instead of access keys for enhanced security.
After launching the Arfniia Router EC2 instance, please make sure a role is attached.

Steps to configure IAM:

  1. Grant the EC2 role self-assuming permission.
Terminal window
# ACCOUNT_ID is your account ID running Arfniia Router instances
# EC2_ROLE_NAME is the EC2 role attached to Arfniia Router instances
aws iam update-assume-role-policy \
--role-name ${EC2_ROLE_NAME} \
--policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::${ACCOUNT_ID}:role/${EC2_ROLE_NAME}",
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}'
  1. Grant the EC2 role permission to invoke Amazon Bedrock models.
Terminal window
# EC2_ROLE_NAME is the EC2 role attached to Arfniia Router instances
aws iam put-role-policy \
--role-name ${EC2_ROLE_NAME} \
--policy-name AllowModelInference \
--policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowModelInference",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": "*"
}
]
}'

Create a Router

Send a POST request to /v1/routers endpoint, e.g. creating a router with the following configs:

  • Choose between 2 base models, Llama 3.1 405B or Claude 3.5 Sonnet
  • Leverage amazon.titan-embed-text-v2:0 as part of prompt understanding
  • Maximize the feedback value, which will be in range [0, 1]
  • Calculate the reward 100% on feedback, ignoring cost factors
  • Apply a cosine similarity threshold of 0.95 as part of the sampling strategy
  • At each routing step, train for 5 epochs with 16 samples
Terminal window
curl -X POST http://${EC2_IP_ADDR}:5525/v1/routers \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "advanced-reasoning",
"base_models": [
"anthropic.claude-3-5-sonnet-20240620-v1:0",
"meta.llama3-1-405b-instruct-v1:0"
],
"embedding": "amazon.titan-embed-text-v2:0",
"feedback": {
"goal": "max",
"max_value": 1.0,
"min_value": 0
},
"feedback_cost_weights": [
1,
0
],
"training": {
"batch_size": 16,
"context_cache_similarity": 0.95,
"num_of_steps": 5
}
}'

Ready!

Send prompts to the OpenAI compatible endpoint, and Arfniia Router will automatically learn and serve the router using a Reinforcement Learning algorithm.

from openai import OpenAI
import requests
router_name = "advanced-reasoning"
base_url = "http://${EC2_IP_ADDR}:27149/v1"
client = OpenAI(api_key="any-text-would-work", base_url=base_url)
resp = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "How many Rs in Strawberrrry?",
}
],
# NOTE: router_name is now the model name
model=router_name,
)
resp_id = resp.id
num_r = resp.choices[0].message.content
feedback = 1.0 if num_r == "5" else 0.0
accuracy = 0.84 # NOTE: the business KPI to measure app performance
# NOTE: submit as "sparse" feedback, since KPI pipeline could be delayed
requests.put(f"{base_url}/feedbacks/{router_name}/sparse/{accuracy}")
# NOTE: submit feedback for each prompt as well, to speed up learning
requests.put(f"{base_url}/feedbacks/{router_name}/{resp_id}/{feedback}")