Let's implement promoting techniques and then evaluate the results to see improvement in the prompt results.
We use the following code:
12345678910111213141516171819202122232425262728293031323334353637
prompt_10 = """Your task is to extract and categorize messages. Here are some example:
---
{{?few_shot_examples}}
---
Use the examples when extract and categorize the following message:
---
{{?input}}
---
Extract and return a json with the follwoing keys and values:
- "urgency" as one of {{?urgency}}
- "sentiment" as one of {{?sentiment}}
- "categories" list of the best matching support category tags from: {{?categories}}
Your complete message should be a valid json string that can be read directly and only contain the keys mentioned in the list above. Never enclose it in ```json...```, no newlines, no unnessacary whitespaces.
"""
import random
random.seed(42)
k = 3
examples = random.sample(dev_set, k)
example_template = """<example>
{example_input}
## Output
{example_output}
</example>"""
examples = '\n---\n'.join([example_template.format(example_input=example["message"], example_output=json.dumps(example["ground_truth"])) for example in examples])
f_10 = partial(send_request, prompt=prompt_10, few_shot_examples=examples, **option_lists)
response = f_10(input=mail["message"])
The code aims to create a prompt template to extract and categorize messages according to their urgency, sentiment, and support category tags. By using randomly selected examples from a development set, it generates a formatted few-shot learning prompt. The prompt is sent to a language model to process and categorize a given input message, and the overall performance of the model is then evaluated and displayed in a table format.
Here’s an expanded explanation for a few parts of the code:
- Setting the Random Seed: It sets a random seed using "random.seed(42)" to ensure that the random sampling of the examples is reproducible. This helps in maintaining consistency in experiments and evaluations.
- Sampling Examples: The variable "k" is set to 3, indicating the number of examples to sample from the "dev_set" dataset. The "random.sample(dev_set, k)" function selects three random examples from the development set.
- Formatting Examples: The selected examples are formatted into a template "example_template". Each example includes the input message and the expected output in JSON format. This formatted string is then joined using "\n---\n" to create a cohesive set of examples.
- Partial Function Application: The "partial" function is used to bind the generated prompt and examples to the "send_request" function, creating a function "f_10" that can be called with just the input message. This streamlines the process of sending requests to the model with the necessary context.
- Sending Request and Evaluating: The script sends the request using "f_10(input=mail["message"])" with the input message from "mail["message"]". The result is stored and evaluated against a small test dataset "test_set_small". The evaluation results are stored in "overall_result["few_shot--llama3-70b"]".
- Output Display: Finally, the "pretty_print_table(overall_result)" function is used to display the evaluation results in a formatted table, making it easier to interpret the results.
You can get the following output prompts:

You can see an example prompt here:

This is another prompt example here:
You can see another example prompt and the response here.

This is the output for evaluation after implementing few-shot prompting.
You can see improvement in sentiment and urgency assignment.
We established a baseline earlier, and now we can evaluate and compare the results of the refined prompts with the baseline using the test data.