Rakam Eval SDK Documentation
This document provides a comprehensive guide to using the rakam-system-tools, a Python client for interacting with the DeepEval API. This SDK allows you to run evaluations on your text and schema data, either synchronously or in the background.
Installation
To get started, install the rakam-system-tools package using pip:
pip install rakam-system-tools
Configuration
The DeepEvalClient is the main entry point for using the SDK. To use it, you need to configure the client with your API endpoint and token.
Initializing the Client
You can initialize the DeepEvalClient by providing the base_url and api_token directly:
from rakam_systems_tools.evaluation import DeepEvalClient
client = DeepEvalClient(
base_url="http://your-deepeval-api-url.com",
api_token="your-api-token"
)
Configuration Options
The client can be configured in three ways, in the following order of precedence:
-
Directly in the constructor: As shown in the example above.
-
Using a settings module: You can pass a settings module to the client. The client will look for
EVALFRAMEWORK_URLandEVALFRAMEWORK_API_KEYattributes in the module.# settings.py
EVALFRAMEWORK_URL = "http://your-deepeval-api-url.com"
EVALFRAMEWORK_API_KEY = "your-api-token"
# your_app.py
from rakam_systems_tools.evaluation import DeepEvalClient
import settings # for django can be something like: from django.conf import settings
client = DeepEvalClient(settings_module=settings) -
Using environment variables: The client will automatically pick up the
EVALFRAMEWORK_URLandEVALFRAMEWORK_API_KEYenvironment variables if they are set.export EVALFRAMEWORK_URL="http://your-deepeval-api-url.com"
export EVALFRAMEWORK_API_KEY="your-api-token"from rakam_systems_tools.evaluation import DeepEvalClient
client = DeepEvalClient()
If no base_url is provided, it defaults to http://localhost:8080.
Usage
The SDK provides methods for evaluating text and schema data. For each type of evaluation, there are synchronous and background (asynchronous) methods.
Text Evaluation
Text evaluation is used for tasks like checking correctness, relevancy, faithfulness, and toxicity of text data.
Data and Metrics
TextInputItem: Represents a single item for text evaluation. It includesid,input,output,expected_output(optional), andretrieval_context(optional).MetricConfig: Defines the metrics to be used for evaluation. Available text evaluation metrics are:CorrectnessConfigAnswerRelevancyConfigFaithfulnessConfigToxicityConfig
Methods
text_eval(): Runs a synchronous text evaluation.text_eval_background(): Runs a background text evaluation.maybe_text_eval(): Randomly runstext_evalbased on a given probability.maybe_text_eval_background(): Randomly runstext_eval_backgroundbased on a given probability.
Example
from rakam_systems_tools.evaluation import DeepEvalClient, TextInputItem, CorrectnessConfig
client = DeepEvalClient()
data = [
TextInputItem(
input="What is the capital of France?",
output="Paris",
expected_output="The capital of France is Paris."
)
]
metrics = [
CorrectnessConfig(model="gpt-4.1", steps=["Check if the output correctly answers the input."])
]
result = client.text_eval(data=data, metrics=metrics, component="faq-bot")
print(result)
Schema Evaluation
Schema evaluation is used for tasks that involve structured data, such as JSON. It can be used to check for JSON correctness and the presence of specific fields.
Data and Metrics
SchemaInputItem: Represents a single item for schema evaluation. It includesinputandoutput.SchemaMetricConfig: Defines the metrics for schema evaluation. Available metrics are:JsonCorrectnessConfigFieldsPresenceConfig
Methods
schema_eval(): Runs a synchronous schema evaluation.schema_eval_background(): Runs a background schema evaluation.maybe_schema_eval(): Randomly runsschema_evalbased on a given probability.maybe_schema_eval_background(): Randomly runsschema_eval_backgroundbased on a given probability.
Example
from rakam_systems_tools.evaluation import DeepEvalClient, SchemaInputItem, JsonCorrectnessConfig
client = DeepEvalClient()
data = [
SchemaInputItem(
input="Generate a JSON object with name and age.",
output='{"name": "John", "age": 30}'
)
]
metrics = [
JsonCorrectnessConfig(
excpected_schema={"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "number"}}}
)
]
result = client.schema_eval(data=data, metrics=metrics, component="json-generator")
print(result)
Probabilistic Evaluation
The maybe_* methods provide a way to run evaluations probabilistically. This can be useful for reducing the load on the evaluation service or for sampling evaluations. The chance parameter is a float between 0 and 1 that determines the probability of the evaluation running.
# This evaluation will run approximately 10% of the time.
client.maybe_text_eval(data=data, metrics=metrics, chance=0.1)
Error Handling
By default, the client methods will not raise exceptions on network errors or non-2xx HTTP responses. Instead, they will return a dictionary with an "error" key. To change this behavior and have exceptions raised, you can set raise_exception=True.
try:
result = client.text_eval(data=data, metrics=metrics, raise_exception=True)
except requests.RequestException as e:
print(f"An error occurred: {e}")
Client-Side Metrics
In addition to server-side evaluations, the SDK supports logging metrics that are calculated on the client side. This is useful when you have your own evaluation methods or want to log scores from other systems. These client-side metrics are sent along with the input data and are logged without any server-side evaluation.
ClientSideMetricConfig
The ClientSideMetricConfig class allows you to define your own metrics. The fields are:
name(str): The name of the metric.score(float): The score of the metric.success(Optional[int]): Whether the evaluation was successful (1 for success, 0 for failure). Defaults to 1.evaluation_cost(Optional[float]): The cost of the evaluation. Defaults to 0.reason(Optional[str]): An optional reason or explanation for the score.threshold(Optional[float]): An optional threshold for the metric. Defaults to 0.
Example
You can include a list of ClientSideMetricConfig objects in the metrics field of each TextInputItem or SchemaInputItem.
from rakam_systems_tools.evaluation import DeepEvalClient, TextInputItem, ClientSideMetricConfig
client = DeepEvalClient()
# Assume you have a custom function to evaluate sentiment
def calculate_sentiment_score(text):
# Your custom logic here
if "happy" in text:
return 1.0
return 0.2
output_text = "I am happy with this product."
sentiment_score = calculate_sentiment_score(output_text)
data = [
TextInputItem(
input="User review",
output=output_text,
metrics=[
ClientSideMetricConfig(
name="sentiment",
score=sentiment_score,
reason="The user expressed a positive sentiment."
)
]
)
]
# You can send client-side metrics without any server-side metrics
result = client.text_eval(data=data, metrics=[], component="review-analyzer")
print(result)
Note: When you send client-side metrics, you can pass an empty list to the `metrics` parameter of the `text_eval` or `schema_eval` methods if you don't want to run any server-side evaluations.