Audio Transcription And Evaluation With Gemini Flash
This notebook is adapted from Google’s Gemini API: Audio Quickstart Notebook and provides an example of how to prompt Gemini Flash using an audio file.
To run the following cell, your API key must be stored it in a Colab Secret named GEMINI_API_KEY. If you don’t already have an API key, or you’re not sure how to create a Colab Secret, see Authentication for an example.
import getpass#from google.colab import userdataGEMINI_API_KEY = getpass.getpass(prompt="Enter your Gemini API Key: ")
## Audio file url --> allows you to play audio in UIURL = "https://storage.googleapis.com/generativeai-downloads/data/State_of_the_Union_Address_30_January_1961.mp3"
You’ll need to set Arize AX variables (Space id, API key and Developer Key) below to send traces to the Arize AX Platform. Sign up for free here.
from opentelemetry import tracefrom arize.otel import registerfrom opentelemetry.trace import Status, StatusCodefrom opentelemetry.semconv.trace import SpanAttributesARIZE_SPACE_ID = getpass.getpass(prompt="Enter your ARIZE SPACE ID Key: ")ARIZE_API_KEY = getpass.getpass(prompt="Enter your ARIZE API Key: ")PROJECT_NAME = "gemini-audio" # Set this to any name you'd like for your app# Setup OTel via our convenience functiontracer_provider = register( space_id = ARIZE_SPACE_ID, # in app space settings page api_key = ARIZE_API_KEY, # in app space settings page project_name = PROJECT_NAME,)trace.set_tracer_provider(tracer_provider)tracer = trace.get_tracer(__name__)
Evaluate Gemini’s output transcript for sentiment analysis
First, export spans from Arize AX that contain transcript output from Arize AX
# Note: This example uses Python SDK v7print('#### Installing arize SDK')! pip install "arize[Tracing]>=7.1.0"print('#### arize SDK installed!')import osos.environ['ARIZE_API_KEY'] = ARIZE_API_KEYfrom datetime import datetime, timezone, timedeltafrom arize.exporter import ArizeExportClientfrom arize.utils.types import Environmentsclient = ArizeExportClient()print('#### Exporting your primary dataset into a dataframe.')primary_df = client.export_model_to_df( space_id=getpass.getpass(prompt="Enter your ARIZE SPACE ID Key: "), model_id=PROJECT_NAME, where="name = 'process_audio'", #Just pull the spans with name = "process_audio" environment=Environments.TRACING, start_time = datetime.now(timezone.utc) - timedelta(days=1), end_time = datetime.now(timezone.utc) #pull traces for the last 24 hours)#set the column in the dataframe to match the variable name used in our eval templateprimary_df["output"] = primary_df["attributes.output.value"]
SENTIMENT_EVAL_TEMPLATE = """You are a helpful AI bot that checks for the sentiment in the output text. Your task is to evaluate the sentiment of the given output and categorize it as positive, neutral, or negative.Here is the data:[BEGIN DATA]============[Output]: {attributes.output.value}============[END DATA]Determine the sentiment of the output based on the content and context provided. Your response should be ONLY a single word, either "positive", "neutral", or "negative", and should not contain any text or characters aside from that word.Then write out in a step by step manner an EXPLANATION to show how you determined the sentiment of the output. Do not include any text or characters aside from the EXPLANATION.Your response should follow the format of the example response below. Provide a single LABEL and a single EXPLANATION. Do not include any special characters in the response. Do not include special characters such as "#" in your response.Example response:EXPLANATION: An explanation of your reasoning for why the label is "positive", "neutral", or "negative"LABEL: "positive" or "neutral" or "negative""""
Evaluate transcriptions using Gemini as a LLM as a Judge
#Gemini as LLM as a Judge - LLM Classify#google auth to access the Gemini model!gcloud auth application-default login # authenticate with google!gcloud config set project audioevals # you must have a valid project id in your google cloud account firstimport pandas as pdfrom phoenix.evals import (GeminiModel, llm_classify)#We will use Gemini 1.5 pro to evaluate the text transcriptionproject_id = "audioevals" # Set this to your google project idgemini_model = GeminiModel(model="gemini-1.5-pro", project=project_id)rails = ["positive", "neutral", "negative"]evals_df = llm_classify( data=primary_df, template=SENTIMENT_EVAL_TEMPLATE, model=gemini_model, rails=rails, provide_explanation=True)#set eval labelsevals_df["eval.sentiment.label"] = evals_df["label"]evals_df["eval.sentiment.explanation"] = evals_df["explanation"]evals_df["context.span_id"] = primary_df["context.span_id"]evals_df.head()
# Note: This example uses Python SDK v7from arize.pandas.logger import Client# Initialize Arize AX client using the model_id and version you used previouslyarize_client = Client( space_id=ARIZE_SPACE_ID, api_key=ARIZE_API_KEY,)# send the evaluation results to Arize AXarize_client.log_evaluations_sync(evals_df, "gemini-audio")
More details about Gemini API’s vision capabilities in the documentation.If you want to know about the File API, check its API reference or the File API quickstart.
Have a look at the Audio quickstart to learn about another type of media file, then learn more about prompting with media files in the docs, including the supported formats and maximum length for audio files. .