How can we harness the potential of generative AI (GenAI) in the realm of digital commerce? And is it possible to use GenAI to streamline the implementation of a composable commerce solution? To answer these questions, our engineering team ran an experiment to test the boundaries of generative AI using GPT-4 for a potential commercetools implementation. Here’s what we learned from this exploration.
2023 has emerged as the year of generative AI, capturing the imagination of not only the media but also businesses — and individuals like you and me. While the hype is certainly here, it raises the question: Is this excitement justified by tangible results? Can generative AI accomplish more than helping school kids write Shakespeare-like sonnets inspired by Dolly Parton?
The answer is a resounding YES. Undoubtedly, GenAI tools such as GPT-4 and Jasper are taking the business world by storm with numerous applications, from the lightning-fast creation of product descriptions to virtual assistants for customer service.
At commercetools, we’re actively exploring the potential of this technology for all things commerce. A recent experiment helped us discover how generative AI using GPT-4 can support customers to implement commercetools Composable Commerce for a typical retailer’s site. Using prompt engineering during the process, we wanted to answer the following questions:
Can GenAI simplify the implementation of commercetools solutions?
Is there any advantage in terms of time, quality and/or reliability?
What is the developer experience like when interacting with GenAI to create, edit and improve code for a commercetools implementation?
Let’s delve deeper into what we learned after six hours of prompting on GPT-4.
Here’s how it started
First things first: The scope. We made the deliberate choice to generate a basic, static eCommerce application replicating a state-of-the-art retailer that provides exceptional customer experiences. For this reason, we randomly chose Adidas.com.
(Note that Adidas is not a customer of commercetools, which aligns with the nature of our experiment of exploring potential new implementations).
Not only does this approach provide an idea of a potential implementation but also sets our expectations higher on what generative AI can help us achieve.
As a step further, this eCommerce site would emulate a standard customer journey, spanning from the homepage to the ultimate destination: The “order complete” page. To bring this scope to life, we employed TypeScript and React to create a custom frontend.
Here’s an overview of the scope and approach we took along the way:
A clone of the adidas.com homepage with a navigation bar. Categories and subcategories from a commercetools project would be displayed in the navigation bar, including:
i. Generate client from commercetools TypeScript SDK.
ii. Generate category drafts.
iii. Manually add generated drafts to commercetools project.
Product description pages (PDPs), including images.
i. Generate a single product type and a single product draft for an Adidas sneaker referencing a previously created category.
ii. Display the product in the list (image, name, price) when the product’s category is selected from the navigation bar.
iii. Display the product in detail (image, name, price, variants) when the product is selected from a category page.
A login interface.
i. Generate an anonymous session for cart creation.
A cart workflow with “Add to Cart” and a “Cart” interface.
i. Add a line item to an anonymous cart when the “Add to Cart” button is clicked from a product detail page.
A basic checkout (no integration with taxes or payment service providers).
An “Order Complete” page.
Our prompts, which you can find here, started from a generic “Create a boilerplate Reach commerce application based on Adidas’s website” down to more precise requirements to rework the code wherever applicable and troubleshoot during the process.
Here’s how it went
After three engineers spent six hours prompting and refining their interactions with GPT-4, we actually accomplished a great deal of what we initially set our sights on:
It’s important to note that all code, except the anonymous session flow, was in fact generated by GPT-4. However, half the time was spent on debugging and resolving issues during the exercise, and therefore the “checkout” and “order complete” pages weren’t completed during the preset timeframe of six hours.
Overall, we were happy with the preliminary results when it came to the creation and integration of commerce functions. However, it’s clear that GPT-4 lacks the visual capabilities to actually replicate a retailer’s site.
The takeaways: The good and the bad of GenAI
Let’s start with the good stuff: We were impressed by GPT-4 proficiency in enhancing and expanding existing code patterns, and its knack for introducing commendable programming patterns and syntax over time.
Furthermore, GPT-4 has the ability to respond correctly to specific instructions and is a capable system of generating boilerplate code. This capability helped us to quickly create simple React applications from the ground up.
However, achieving suitable results wasn’t without its challenges. Our engineers found that the tool often required constant code references and hand-holding to get the code right, suggesting that it’s better suited to edit and refine code instead of creating it from scratch. This was the case even when using the comprehensive commercetools API documentation as a reference.
As already pointed out, GPT-4 completely lacks visual thinking abilities at this stage. The inconsistent styles with the well-known Adidas look and feel are plain to see. When tasked to create styles for commercetools’ components, the tool inadvertently created class name collisions, which needed to be corrected with CSS modules.
In a nutshell, while we got decent boilerplate code, it comes as no surprise that a GenAI tool needs very clear and precise requirements to generate meaningful code. Our engineers also spent a fair amount of time copying in the component or block of code that had just been created in order to ask for revisions, refactors and/or bug fixes. Interestingly, GPT-4 generated non-existent commercetools SDK API methods, which made troubleshooting even more time-consuming.
So, what have we learned from this exercise? Here are our four key takeaways:
Use GenAI to bootstrap a basic application
Generative AI is a strong tool to enhance code patterns and generate boilerplate code, specifically HTML layouts and general views, as well as attach mock handlers to React components. GPT-4 does provide the activation energy to bootstrap a basic application and was able to produce general references that could be used in a potential commercetools implementation.
Expect a lack of context, consistency and the occasional inaccurate code
From our point of view, weaknesses include a lack of understanding the right context to create the code appropriately, difficulty in keeping up visual consistency and inaccurate code suggestions that kept popping up along the way. At the same time, it’s hardly surprising that those issues arose since we dealt with an untrained foundation model up to now.
What we also learned is that validation tools like Typescript are invaluable to catch and correct code whenever necessary.
Combine AI-powered tools for improved results
As mentioned above, GPT-4’s strength doesn’t exactly lie in visual capabilities. What we tried post-experiment was to create a clone version of adidas.com with fit-for-purpose AI-based tools, such as Midjourney to create the design and Figma to adapt the design for the code. That way, we were able to replicate the look and feel of adidas.com with surprisingly good results.
Significant human oversight is needed
GPT-4, and probably any other generative AI tool, shouldn’t be solely relied upon to generate out-of-the-box boilerplate code just yet. As you probably guessed, AI tools can be powerful but they need human expertise, as well as clear communication and precise requirements in machine learning, especially when it comes to complex documentation.
For instance, if we ran this experiment a second time, we’d likely integrate the commercetools SDK, as well as generate code with React applications manually instead of using GenAI.
Here’s how it’ll go: What do these lessons mean for you?
It’s clear that what we’re experiencing today is a technology on the rise, and a lot of improvements will certainly pop up in the next months and years. And while AI-powered commerce is definitely written in the stars — especially at commercetools — we see it as a companion that helps us get more efficient rather than a replacement for human ingenuity.
This experiment showed us that it’s already possible to leverage GenAI to kickstart a commerce implementation. This can be particularly relevant for teams that aren’t yet familiar with the world of composable commerce and MACH®, and need a shortcut to accelerate development. At the same time, our partners can use it as a tool to accelerate implementations and test cycles for their own clients.
In the near future, we expect that you’ll be able to do even more. Moving from prompt engineering (manually crafted prompts) to prompt-tuning (updating some of a pre-trained model’s parameters), it will be possible to optimize LLMs (large language models) for specific tasks. When that comes, the power of generative AI to create commerce implementations will be even more transformative.
And, to answer the questions we stated at the beginning of this article, we do believe that GenAI will be able to simplify implementations at a certain point in time while also improving the developer experience as the technology matures and trained models arise.
In any case, you can already start trying out how generative AI and commerce converge with composability. A composable architecture, championed by commercetools, is crucial to experiment, implement and adapt any AI applications with ease — and without disruption to your business operations or tech stack. That’s really the first step in this journey!
If you’re ready to experiment with generative AI on commercetools, use our comprehensive API documentation as a baseline and start your own prompt engineering/tuning.