Getting Started

Launch the AMI on AWS

Please drop us a message or reach out to connect@arfniia.com to request the free trial access and get started with launching the AMI.

We recommend using the 4th generation of Intel Xeon CPU instances, such as m7i.2xlarge or higher, to ensure optimal efficiency. Arfniia Router is specifically optimized for these architectures, offering up to 2x better performance per routing step with default configurations, compared to 3rd generation instances.

Configure IAM

Arfniia Router exclusively uses IAM roles instead of access keys for enhanced security.
After launching the Arfniia Router EC2 instance, please make sure a role is attached.

Steps to configure IAM:

Grant the EC2 role self-assuming permission.

# ACCOUNT_ID is your account ID running Arfniia Router instances
# EC2_ROLE_NAME is the EC2 role attached to Arfniia Router instances

aws iam update-assume-role-policy \
    --role-name ${EC2_ROLE_NAME} \
    --policy-document '{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::${ACCOUNT_ID}:role/${EC2_ROLE_NAME}",
                "Service": "ec2.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}'

Grant the EC2 role permission to invoke Amazon Bedrock models.

# EC2_ROLE_NAME is the EC2 role attached to Arfniia Router instances

aws iam put-role-policy \
    --role-name ${EC2_ROLE_NAME} \
    --policy-name AllowModelInference \
    --policy-document '{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowModelInference",
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel"
            ],
            "Resource": "*"
        }
    ]
}'

Create a Router

Send a POST request to /v1/routers endpoint, e.g. creating a router with the following configs:

Choose between 2 base models, Llama 3.1 405B or Claude 3.5 Sonnet
Leverage amazon.titan-embed-text-v2:0 as part of prompt understanding
Maximize the feedback value, which will be in range [0, 1]
Calculate the reward 100% on feedback, ignoring cost factors
Apply a cosine similarity threshold of 0.95 as part of the sampling strategy
At each routing step, train for 5 epochs with 16 samples

curl -X POST http://${EC2_IP_ADDR}:5525/v1/routers \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "advanced-reasoning",
    "base_models": [
      "anthropic.claude-3-5-sonnet-20240620-v1:0",
      "meta.llama3-1-405b-instruct-v1:0"
    ],
    "embedding": "amazon.titan-embed-text-v2:0",
    "feedback": {
      "goal": "max",
      "max_value": 1.0,
      "min_value": 0
    },
    "feedback_cost_weights": [
      1,
      0
    ],
    "training": {
      "batch_size": 16,
      "context_cache_similarity": 0.95,
      "num_of_steps": 5
    }
  }'

Ready!

Send prompts to the OpenAI compatible endpoint, and Arfniia Router will automatically learn and serve the router using a Reinforcement Learning algorithm.

from openai import OpenAI
import requests

router_name = "advanced-reasoning"
base_url = "http://${EC2_IP_ADDR}:5525/v1"
client = OpenAI(api_key="any-text-would-work", base_url=base_url)
resp = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "How many Rs in Strawberrrry?",
        }
    ],
    # NOTE: router_name is now the model name
    model=router_name,
)
resp_id = resp.id
num_r = resp.choices[0].message.content
feedback = 1.0 if num_r == "5" else 0.0
accuracy = 0.84  # NOTE: the business KPI to measure app performance

# NOTE: submit as "sparse" feedback, since KPI pipeline could be delayed
requests.put(f"{base_url}/feedbacks/{router_name}/sparse/{accuracy}")

# NOTE: submit feedback for each prompt as well, to speed up learning
requests.put(f"{base_url}/feedbacks/{router_name}/{resp_id}/{feedback}")