Teams that are using conversation bots and assistants desire to know whether a user interacting with the bot is frustrated. The user frustration Evaluation can be used on a single back and forth or an entire span to detect whether a user has become frustrated by the conversation.

User Frustration Template

 You are given a conversation where between a user and an assistant.
  Here is the conversation:
  [BEGIN DATA]
  *****************
  Conversation:
  {conversation}
  *****************
  [END DATA]

  Examine the conversation and determine whether or not the user got frustrated from the experience.
  Frustration can range from midly frustrated to extremely frustrated. If the user seemed frustrated
  at the beginning of the conversation but seemed satisfied at the end, they should not be deemed
  as frustrated. Focus on how the user left the conversation.

  Your response must be a single word, either "frustrated" or "ok", and should not
  contain any text or characters aside from that word. "frustrated" means the user was left
  frustrated as a result of the conversation. "ok" means that the user did not get frustrated
  from the conversation.

How to Run

Try it out!

The following is an example of code snippet for implementation:

from phoenix.evals import (
    USER_FRUSTRATION_PROMPT_RAILS_MAP,
    USER_FRUSTRATION_PROMPT_TEMPLATE,
    OpenAIModel,
    download_benchmark_dataset,
    llm_classify,
)

model = OpenAIModel(
    model_name="gpt-4",
    temperature=0.0,
)

#The rails is used to hold the output to specific values based on the template
#It will remove text such as ",,," or "..."
#Will ensure the binary value expected from the template is returned
rails = list(USER_FRUSTRATION_PROMPT_RAILS_MAP.values())
relevance_classifications = llm_classify(
    dataframe=df,
    template=USER_FRUSTRATION_PROMPT_TEMPLATE,
    model=model,
    rails=rails,
    provide_explanation=True, #optional to generate explanations for the value produced by the eval LLM
)

Eval	GPT-4	GPT-4o	GPT-4 Turbo	Gemini Pro	GPT-3.5	GPT-3.5 Turbo Instruct	Palm (Text Bison)	Claude V2
Precision	1	1	1	1	0.99	0.42	1	1
Recall	0.89	0.92	0.98	0.98	0.83	1	0.94	0.64
F1	0.94	0.96	0.99	0.99	0.90	0.59	0.97	0.78

Alyx

Develop

Prompts

Evaluate

Observe

Machine Learning

Security & Settings

User Frustration

User Frustration Template

How to Run

Try it out!

Alyx

Develop

Prompts

Evaluate

Observe

Machine Learning

Security & Settings

​User Frustration Template

​How to Run

Try it out!

User Frustration Template

How to Run