Let's now evaluate different models for Facility solutions problem that we're solving.
mistralai--mixtral-8x7b-instruct-v01
We begin with mistralai--mixtral-8x7b-instruct-v01 and use the basic prompt. This model is an example of the cheapest open source, SAP hosted models available on generative AI hub.
123overall_result["basic--mixtral-8x7b"] = evalulation_full_dataset(test_set_small, f_8, _model='mistralai--mixtral-8x7b-instruct-v01')
pretty_print_table(overall_result)
This code evaluates a dataset and prints the results. It calculates a specific model's performance on a small test set, storing the results under a key in the "overall_result" dictionary. The "pretty_print_table" function then formats and prints these results, making the evaluation data clear and easy to read.
You can have the following output and can see the results.

Similarly, let's evaluate results using a combination of few-shot and metaprompting for the same model.
123overall_result["metaprompting_and_few_shot--mixtral-8x7b"] = evalulation_full_dataset(test_set_small, f_13, _model='mistralai--mixtral-8x7b-instruct-v01')
pretty_print_table(overall_result)
You can have the following output:

You can see the evaluation results.
gpt-4o
We perform similar steps with gpt-4o. This model is an example of the best proprietary OpenAI models available on generative AI hub.
123overall_result["basic--gpt4o"] = evalulation_full_dataset(test_set_small, f_8, _model='gpt-4o')
pretty_print_table(overall_result)

You can see results for these outputs.
Similarly, let's evaluate results using a combination of few-shot and metaprompting for the same model.
123overall_result["metaprompting_and_few_shot--gpt4o"] = evalulation_full_dataset(test_set_small, f_13, _model='gpt-4o')
pretty_print_table(overall_result)
You can have the following output.

You can see the evaluation results.
gemini-1.5-flash
We perform similar steps with gemini-1.5-flash. This model is the cheapest and fastest Google model available on generative AI hub.
123overall_result["basic--gemini-1.5-flash"] = evalulation_full_dataset(test_set_small, f_8, _model='gemini-1.5-flash')
pretty_print_table(overall_result)
You can have the following output:

You can see results for these outputs.
Similarly, let's evaluate results using a combination of few-shot and metaprompting for the same model.
123overall_result["metaprompting_and_few_shot--gemini-1.5-flash"] = evalulation_full_dataset(test_set_small, f_13, _model='gemini-1.5-flash')
pretty_print_table(overall_result)
You can have the following output:

You can see the evaluation results.
Note
You may get a slightly different response to the one shown here and in all the remaining responses of models shown in this learning journey.
When you execute the same prompt in your machine, a LLM produces varying outputs due to its probabilistic nature, temperature setting, and non-deterministic architecture, leading to different responses even with slight setting changes or internal state shifts.