Experiments can be run as either Synchronous or Asynchronous.We recommend:
Synchronous: Slower but easier to debug. When you are building your tests these are inherently easier to debug. Start with synchronous and then make them asynchronous.
Asynchronous: Faster. When timing and speed of the tests matter. Make the tasks and/or Evals asynchronous and you can 10x the speed of your runs.
Code errors for synchronous tasks break at the line of the error in the code. They are easier to debug and we recommend using these to develop your tasks and evals.
The synchronous running of an experiment runs one after another. The asynchronous running of an experiment runs in parallel.
Synchronous vs Asynchronous Task and Eval
Here are some code differences between the two. You just need to add the async keyword before your functions def and add async_ at the front of the name, and then run nest_asyncio.apply(). This will rely on the concurrency parameter in run_experiment, so if you’d like to run them faster, set it to a higher number.
Running a test on dataset sometimes requires running on random or stratified samples of the dataset. Arize supports running on samples by allowing teams to download a dataframe. That dataframe can be sampled prior to running the experiment.
Copy
Ask AI
# Get dataset as Dataframedataset_df = arize_client.get_dataset(space_id=SPACE_ID, dataset_name=dataset_name)# Any sampling methods you want on a DFsampled_df = dataset_df.sample(n=100) # Sample 100 rows randomly# Sample 10% of rows randomlysampled_df = dataset_df.sample(frac=0.1)# Create proportional sampling based on the original dataset's class label distributionstratified_sampled_df = dataset_df.groupby('class_label', group_keys=False).apply(lambda x: x.sample(frac=0.1))# Select every 10th rowsystematic_sampled_df = dataset_df.iloc[::10, :]# Run Experiment on sampled_dfclient.run_experiment(space_id, dataset_name, sampled_df, taskfn, evaluators)
An experiment will only matched up with the data that was run against it. You can run experiments with different samples of the same dataset. The platform will take care of tracking and visualization.Any complex sampling method that can be applied to a dataframe can be used for sampling.
When running experiments, arize_client.run_experiment() will produce a task span attached to the experiment. If you want to add more traces on the experimental run, you can actually instrument any part of that experiment and they will get attached below the task span
from opentelemetry import trace# Outer function will be traced by Arize with a spandef task_add_1(dataset_row): tracer = trace.get_tracer(__name__) # Start the span for the function with tracer.start_as_current_span("test_function") as span: # Extract the number from the dataset row num = dataset_row['attributes.my_number'] # Set 'num' as a span attribute span.set_attribute("dataset.my_number", num) # Return the incremented number return num + 1
Tracing Using Auto-Instrumentor
Copy
Ask AI
# Import the automatic instrumentor from OpenInferencefrom openinference.instrumentation.openai import OpenAIInstrumentor# Automatic instrumentation --- This will trace all tasks below with LLM CallsOpenAIInstrumentor().instrument()task_prompt_template = "Answer in a few words: {question}"openai_client = OpenAI()def task(dataset_row) -> str: question = dataset_row["question"] message_content = task_prompt_template.format(question=question) response = openai_client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": message_content}] ) return response.choices[0].message.content
Experiments SDK differences in Arize AX vs Phoenix OSS
There are subtle differences between the experiments SDK using Arize AX vs. Phoenix, but the base concepts are the same.You can check out a full notebook example of each. The example below runs an experiment to write a haiku, and evaluate its tone using an LLM eval.
Arize uses the ArizeDatasetsClient . arize_client.create_dataset returns a dataset_id, instead of a dataset object. So if you want to print or manipulate the dataset, you will need to get the dataset using arize_client.get_dataset.Phoenix uses px.Client().upload_dataset
Copy
Ask AI
# Example dataframeinventions_dataset = pd.DataFrame({ "attributes.input.value": ["Telephone", "Light Bulb"], "attributes.output.value": ["Alexander Graham Bell", "Thomas Edison"],})############## FOR ARIZE############## Setup Importsimport pandas as pdfrom arize.experimental.datasets import ArizeDatasetsClientfrom arize.experimental.datasets.utils.constants import GENERATIVEfrom uuid import uuid1# Setup Arize datasets connectionARIZE_SPACE_ID = ""ARIZE_API_KEY = ""arize_client = ArizeDatasetsClient(api_key=ARIZE_API_KEY)# Create dataset in Arizedataset_id = arize_client.create_dataset( dataset_name="inventions"+ str(uuid1())[:5], data=inventions_dataset, space_id=ARIZE_SPACE_ID, dataset_type=GENERATIVE)# Get dataset from Arizedataset = arize_client.get_dataset(space_id=ARIZE_SPACE_ID, dataset_id=dataset_id)############## FOR PHOENIX#############import phoenix as pxfrom uuid import uuid1# Upload dataset to Phoenixdataset = px.Client().upload_dataset( dataset_name="inventions"+ str(uuid1())[:5], dataframe=inventions_dataset, input_keys=["attributes.input.value"] output_keys=[""attributes.output.value"])
Show Task definition
In Arize, we use data from the dataset_row as prompt template variables. The possible variables to pass in are:
input, expected, dataset_row, metadata .
In Phoenix, you can do this using example . The possible variables to pass in are:
input, expected, reference, example, metadata
Copy
Ask AI
############## FOR ARIZE#############def find_inventor(dataset_row) -> str: # Dataset row uses the dataframe from above invention = dataset_row.get("attributes.input.value") # send invention topic to LLM generation############## FOR PHOENIX#############def find_inventor(example) -> str: invention = example.get("attributes.input.value") # send invention topic to LLM generation
Show Evaluator definition
For both Arize and Phoenix, you can often use the exact same function as your evaluator. Phoenix does have slightly different way of accessing metadata from your dataset.Arize uses input, output, dataset_row, metadata as the optional input variables to pass into the function.Phoenix uses input, expected, reference, example, metadata as the input variables to pass into the function.
Copy
Ask AI
# FOR ARIZE IMPORTfrom arize.experimental.datasets.experiments.evaluators.base import EvaluationResult# FOR PHOENIX IMPORTfrom phoenix.experiments.types import EvaluationResult############## FOR ARIZE AND PHOENIX#############from phoenix.evals import ( OpenAIModel, llm_classify,)CUSTOM_TEMPLATE = """You are evaluating whether tone is positive, neutral, or negative[Message]: {output}Respond with either "positive", "neutral", or "negative""""def is_positive(output): df_in = pd.DataFrame({"output": output}, index=[0]) eval_df = llm_classify( dataframe=inventions_dataset, template=CUSTOM_TEMPLATE, model=OpenAIModel(model="gpt-4o"), rails=["positive", "neutral", "negative"], provide_explanation=True ) # return score, label, explanation return EvaluationResult(score=1, label=eval_df['label'][0], explanation=eval_df['explanation'][0])
Show Run the experiment
Arize and Phoenix use slightly different functions to run an experiment due to the permissioning available in Arize.
In Arize, you must pass in the dataset_id and space_id.
In Phoenix, you must pass in the dataset object itself.
Copy
Ask AI
############## FOR ARIZE############## Uses ArizeDatasetsClient from abovearize_client.run_experiment( space_id=ARIZE_SPACE_ID, dataset_id=dataset_id, task=find_inventor, evaluators=[is_positive], #include your evaluation functions here experiment_name="inventions-experiment")############## FOR PHOENIX#############from phoenix.experiments import run_experimentexperiment_results = run_experiment( dataset=dataset, task=find_inventor, evaluators=[is_positive], experiment_name="inventions-experiment")