I enrolled in an Open SAP course and I was curious if a language model like ChatGPT could pass an exam like this. So after doing the exam I asked a few questions to ChatGPT and documented the results. It is not a complete or deep analysis of this subject, just a new perspective for courses and exams.
First I don’t want you or anyone to cheat on exams like this. This whole course and certification process is a gift from SAP Walldorf and I am grateful for it. Still I know that some people will try using ChatGPT in similar exams.
As this exam was created in 2020 there were no public multifunctional GPTs or similar models. The first public version of ChatGPT appeared only in November 2022. At that time course and exam creators couldn’t even think about dealing with intelligent language models. Meanwhile in November 2023 this course was “retired” so I suppose I can publish exam questions without blurring them.
First a few words about the course “SAP Fiori Overview: Design, Develop and Deploy / Year 2020”
I consider it an intensive and general overview of web development using SAP technologies. It starts with general service design concepts. Instead of paying for professional actors the management of SAP Walldorf made the videos themselves. It may be not that professional looking like a Meta promotional video, but it is certainly more authentic and personal for me. Think like “We made this product and now I present it, even if I am not an actress in a LA studio.”
I did the short “Open Design” course from Duke University, then this SAP course and I now am doing “Design Thinking Specialization” from University of Virginia. They agree in many points, for example
- Users come first
- Design leads development
- Designers should listen 80% and talk 20% in interviews and tests with users
The Open SAP course is very SAP specific and pushes the Fiori 3 technology. It is usable in other environments, although I would not learn general UX / service design this way.
This is my course overview. After each section/week you have a self testing part. I did even find something that I think is an error in one of the tests:
This here is a contradiction. I think “Design should be throughout the entire development process” and not only at the beginning of development. That is called “waterfall development” and although it may function in simple, well known situations I would not advise it for new innovative transactions.
ChatGPT vs Open SAP course: The Final Exam
You have like 120 minutes for 40 questions, which seems like a lot of time. Then you see the questions and really need time to think about them. So for me at least it was not too much time. I made the exam and only then – of course 😉 – did I try to use chatGPT to answer some questions.
Let’s choose three questions and see how far ChatGPT can come answering them. Question one:
Not a clear answer. Actually it is a wrong answer and it will take you minutes to enter and read the answer. ChatGPT just lost three points, so it is 3:0 to Open SAP exam vs ChatGPT.
Another question, this time let’s try giving the predefined options to answer them:
So ChatGPT selected three out of five correctly, but also selected one incorrectly. That means you lose one correct answer here, so you have one point here from the three, this means 6:1 to Open SAP vs ChatGPT.
And our last question is:
As you can see ChatGPT 3.5 choose two incorrect and one correct answers again getting zero points from the three points. So the final score is:
9:1 to Open SAP exam vs ChatGPT3.5
This last question I tested with ChatGPT 4 and it gave me a different, partially wrong answer 🙂
As of October 2023, ChatGPT versions 3.5 and 4 just aren’t there yet when it comes to passing an Open SAP exam. Sure, they get the gist of the questions and sound pretty confident in their answers, but they still mess up big time. We’re seeing constant improvements in these language models, but let’s be real: jumping from a success rate of just 10-20% to the 80-90% needed to ace an exam like this is a huge leap. Language models are not yet properly trained for exams like this.
On the ChatGPT side of things, it’d be really helpful if they could add some kind of ‘certainty rating’ to their answers. That way, users would have a heads-up about how sure the model is about its responses.
What we can learn from this, future possibilities
ChatGPT and similar language models (LLMs) are becoming parts of our lives. We use them even if we are not aware. Google uses them in search results, Meta/Tiktok/Linkedin uses them in feeding people, advertisers use them to create and target ads.
I bet SAP Walldorf is already creating a SAP specific language model – or collection of models – to better serve its customers. This process is starting with a test question catalog, what is a kind of exam for a language model. For example such a language model has to pass all of the hundreds of tests SAP has.
How can such an LLM development proceed ? So a simplified step list can look like this:
- collect all relevant texts, video transcripts and a huge collection of already answered questions.
- take an already working LLM and train it further using these material and huge amounts of computing power. Think like thousands of CPUs/GPUs using more than five megawatts of power costing more than 10 000 USD per hour for weeks.
- While training one uses the control tests continuously and hopefully one arrives to an answer quality above 90%.
So soon we will have an SAP promt saying “How can we help you?” and capable of answering almost any related questions. Still – I believe – you will need superusers and consultants to ask those questions and help you find your way forward.