We used AI assistants in Cursor AI to recreate 2 years of work by ~20-member team of undergraduate students, graduate students, a postdoctoral researcher, and a couple of professors and I am writing this article to help people understand what it takes to write code using AI copilots.

For those of you who are engineers just looking to get the gist of it, incidentally here is a Github gist of how I created a product requirement document (PRD) by prompting Gemini, and then edited it surgically to get an updated output PRD that I could then use within cursor through multiple rounds of implementation-debugging-deployment cycles in order to rebuild our marketplace. But take a look at the marketplace first on the home page of our lab website, cheekily called Truth Market, so that you understand what it means when I say “we built a two-sided marketplace for behavioral experimentation in Javascript”. And to be fair we did build it on top of an experimental (i.e. under active development) experimentation library (yes, I know it was confusing) called Empirica, developed largely as a labor of love at MIT. his file contains the prompts we used to get Cursor to design an entire platform for us (to be honest cursor did 35% of the work and we had to do 65% but it was of huge help either way!). Let me confirm that you did in fact check out the final version of the product that we developed, in the videos on the landing page at Truth Market, and then I can walk you through how we went about using AI to help write code.

In this article–intended more for the students I work with and my team at the nonprofit I run called SimPPL more than anyone else–are the things naive vibe coders might not grasp about assistants like Cursor, and I will use an intuitive example to help you understand why this is the case. I like examples siu=mply because they help me understand and ultimately remember ideas better.

A Fictional story about Marco and Gordon to Teach you about AI Coding

Say you are a chef called Marco, and you are opening a new restaurant in London. You would like to create a new take on a marvelous dessert, Crème brûlée, to round off the meal for a dinner that you are hosting for dignitaries at your new restaurant’s opening night. You have a genius idea to reinvent this age-old dish: if you could add some kind of spice (ginger, nutmeg, cinnamon), you could create a new variant of it! The sweetness will balance the spice and your restaurant will be the talk of the town (until they go back to featuring the next top TikTok trend, at least).

But come the morning of the event, you are held up on the tube (for non-Brits, read: train) and cannot make it in time to the kitchen! Oh no! But fear not, to help you save the day and create the dish, you have a second-in-command, or a sous chef that has made it to the restaurant and is busy preparing for the event. You give the sous chef a call!

“Gordon, listen to me!”

“Yes, Marco, what’s going on, I’m busy running your f*cking kitchen for you while you’re not around”

“Stop with the snarky attitude young man, it’s not my fault this tiny train full of smelly people (for Brits read: the tube) is stuck here, but while I get there, I need you to start to make the most marvelous spiced crème brûlée for dessert tonight! I’ll join you in making it when I get there.”

“Yes chef, will do.”

If that were all Marco needed to do for getting Gordon to make crème brûlée, life would be all rainbows and unicorns and while he made that, we would be able to eradicate poverty and achieve world peace. But sadly, that isn’t the case.

The first thing Gordon needs is a list of ingredients. He doesn’t know which ingredients exactly Marco wants him to use to make the crème brûlée. He finds that there are several different sets of ingredients people recommend for the same dish. And then there’s different brands that make each ingredient and he knows Marco is very particular about the brands he uses so he cannot just pick one at random. This dinner is the opening night and many food critics will be in attendance. The success of their restaurant might depend on it!

Gordon thinks to himself: F*ck me, there’s way too many different lists of ingredients available online. I don’t know which one Marco wants. I should call him up again.

_“Marco, you didn’t tell me what are the ingredients to add in. There’s tons available online and even among our restaurant notes for I see at least 3 ingredient lists noted down. Can you clarify?”

“Ah yeah, follow the one with 3 hours of cooking time, 20 mins. prep time, for 4 servings. It should need 3 large egg yolks, 1/4 cup white sugar, 1 teaspoon vanilla extract, 1 cup heavy cream, 2 tablespoons white sugar, and 1 tablespoon brown sugar. Found it yet?”

“Yes, chef! But this is all for crème brûlée. What about the ‘spiced’ stuff that you want to use to reinvent this dish?”

“All in good time Gordon, let’s have you put together the tried-and-tested base ingredients first, then I’ll tell you about when to add in the spices. As an example, you remember how we made the puddings, mousses, and flans in the past few months. This will be similar in essence to the way we made those dishes with a spiced twist at the end of it. Capische?”

“Yes, chef!”

So now there is a list of ingredients but Gordon realizes he doesn’t know what to do first, preheat the oven (since prep time is barely 20 mins) and a larger oven takes longer to preheat, or to whisk the eggs together first. Also should he add sugar and cream in during or after whisking. Should the cream be room temperature or hot? Should he add any water in to improve the consistency? And at what point should he stop whisking if so?

“Chef Marco, sorry to bother you again, I need some details clarified. Can you tell me about the steps to follow in making this? When should I add in items, whether the cream should be hot or room temperature when mixed in, and whether I should add sugar before or after the cream? And again, what about the spices? I’m not sure how you want it.”

“Gordon, if you wanted the full recipe why didn’t you just ask for it? Look in the third drawer in my office desk. No wait, that’s where I keep my Labubus. Look in the fourth drawer and you’ll find a binder with the recipe for the dish. Follow the steps I’ve enlisted there, and start by sticking to the regular recipe for crème brûlée ok? I will tell you about the spices later.”

“Yes chef!”

So Gordon starts off and it is going well when he realizes they have an issue. The recipe was intended for 4 servings, but the dinner has several hundred guests. How should he get this done? Clearly, 4 servings at a time isn’t going to get anything done within time!

“Chef, I’m making following the recipe and making four servings but I need to get some advice on how to speed things up or we won’t be done on time!”

“Gordon–I hate to say this–but that’s actually a great point. I forgot that I had made that recipe for preparing a weekend dessert for my family of 4, not a dinner party of 75 guests. What you’ll want to do is set up three stations, one for egg whisking, one for mixing in hot cream and sugar, and an area for pouring it out into the ramekins that you’ll find at the corner of the pantry on a shelf. Get help from a couple of others and staff each of the stations so you can do things faster. Use the electric whisk for the eggs, and use a larger pan to heat more portions of cream at once but be careful not to overheat it, and set up a couple of ovens so that you don’t have to wait to load multiple batches into them. That way you should be able to get things done faster!”

“Yes chef! Thank you.”

“Ok and now that you understand this, let me tell you about the spices: I just need you to add a pinch of ground nutmeg, ginger, and cinnamon to the mixture after the cream is in. So remember that’s about 1/4 teaspoon each, and additionally 1/8 teaspoon of cloves to the preparation. That should bring out the spiced flavor in the crème brûlée! Did you get all that?”

“Ohhh now I get it. Understood, yes chef!”

And things start progressing smoothly on Gordon’s end even as Marco’s train pulls up into the stop, he steps off and rushes on foot to the restaurant since he checked to see that there’s surge pricing on the apps and who hails a cab these days–it’s all in on technology or back to the stone age. So Marco, huffing and puffing, enters the kitchen and walks up in horror to see Gordon dropping literal pieces of ginger and semi-whole cinnamon sticks into the ramekin as he prepares the mix for crème brûlée!

“Gordon why the f*ck are you dropping pieces of uncut ginger and cloves into my dessert!?”

“But chef you said ground nutmeg, ginger, and cinnamon into the mix… so I added ground nutmeg, then added ginger, and then added cinnamon and I remembered to measure them out 1/4 teaspoon each!!! I just have to add in 1/8 teaspoons of whole cloves.”

“For f*ck’s sake Gordon, I meant ground EVERYTHING, nutmeg, ginger, and cinnamon–and of course I meant ground cloves not whole cloves. Who the f*ck has whole ginger, cloves, and sticks of cinnamon in a dessert dish? Use your f*cking brain!”

“But chef, it’s a new recipe, I thought you are innovating and I should respect your ideas!”

And Marco looks at his young protegé, with an expression that can only be described as incredulous disbelief, and sighs to himself. He lets out a deep sigh and walks over to the table to start picking whole ginger and cinnamon sticks out of the crème brûlée mix, thinking to himself I only have myself to hold responsible for this.

“Ok Gordon, let’s start fixing the spiced crème brûlée. We can still salvage the remaining ones for the dinner.”

“Yes, chef!”

And that, dear reader, is where I segue into AI copilots for accomplishing our tasks because the one statement that remains true for any copilot is that humans are really bad at expressing their ideas clearly the first time around. We learn, and we get better at it, but the reason that happens is we get feedback about our ideas! The one thing AI does not do is talk back at humans to tell them they’re not being clear, not expressing the right ideas, thinking in a logically incorrect manner, or asking it to do things that are functionally impossible to achieve. AI is trained to just do it (sorry, Nike). Even if it does an absolute garbage job and leaves you the worse off for it. So please, learn to give yourself that feedback especially when using AI for programming, because writing bad code is really easy! Just ask Michal Fita about how to go from bad code to worse code, even without AI assistance.

If you didn’t read the article, at least watch this video

I really like the video of a dad making a sandwich below as a great example of how to follow instructions exactly as stated and end up being unable to make even a simple peanut butter sandwich. AI is kinda like that, in fact it is exactly like that a lot of times, and I honestly wonder about the amount of “AI waste” produced because we need to correct prompts, and reprompt, and metaprompt just because it’s so easy to write bad prompts that we are all incentivized to just prompt it (I’m registering this one, Nike) without care for whether we need to add some ingredients, provide a recipe, and add bells and whistles to create our personal

How to make a Peanut Butter Sandwich

What are the steps that we learnt about Prompting AI Models for Generating Code

Step 1

First, you will understand the prompt we used to generate a product requirement document. If you do not know what that is, look up this Atlassian post about it and then (and only then) read a lightweight version by Figma so you understand how you might reduce it down a little. Think of it as the ingredients AND recipe that the model needs for making the new dish.

Here is the prompt I used on Gemini (free) in order to get an LLM to generate a PRD for me.

I want you to look up and understand the documentation for the open-source behavioral experimentation library Empirica at https://docs.empirica.ly to understand the architecture it follows in designing two-sided games. I want you to think step by step to generate a clear and concise product requirement document to build a two-sided digital marketplace where some human buyers can "play" against some human sellers.  Each buyer has the goal of maximizing their individual utility gained from a product purchase through advertisements on a marketplace like Amazon, and each seller will gain a profit from the sale of a product in a marketplace like Amazon. The details of such a marketplace design can be extracted from the attached research paper in order to compose this product requirement document. I want you to be minimalistic and suggest only a single-round marketplace with Empirica stages as determined from the documentation.
Focus on the parts of the paper that talk about the marketplace design and suggest a concrete product requirement document along with suggested modularization of functionality that will be input into a copilot assistant to build out the codebase iteratively.

Now, I attached an entire research paper (this one) in order to generate a PRD and that was OK but it did not work perfectly because at the end of the day your prompt has to have some more details and you can provide all the links, documentation, research papers, whitepapers, and such, but you will need to put in the hard work of editing and ensuring you are prompting your LLM with clear requirements that need to be reflected in the PRD. So here is the series of steps I followed in order to get the right PRD.

Step 2: Go section-wise, and explain your motivation

You really need to know what you want, why you want it, and need to convey that to the model. Just like Marco started with explaining to Gordon that this was for dessert, and did not explain how to add spices (in order to avoid confusing Gordon) until he got to the point of adding them in, first I get to core functionality and only then I get to UI/UX.

This is good! But please make a reputation functionality (thumbs up and thumbs down application to a product seller by the product buyer who saw the advertisement for it) an explicit feature that can also be enabled or disabled based on Empirica factor within a treatment.
Also please enlist all of the game-specific nuances and select which of these you want to ensure are implemented as factors (hint: if there is a game variable that is changed in different market conditions for testing the effects of staked claims then they should be factors, so that it is easier to randomize them across games. These may include but not be limited to the cost, price, or value of a product, number of each type of seller--honest, cheat, bait-and-switch--in the market, number of human buyers, number of human sellers, number of LLM "agentic" sellers, presence of a stakesEnabled option, then they should be included as a factor)
Finally, I want you to also describe the UI/UX of each of the stages in detail so that the product designer may be able to better understand what kind of UI layouts to design for the project.

Step 3: Provide explicit instructions and link specific documentation

You will need to ensure the model is aware of documentation especially for custom libraries / software because otherwise you will run into issues with it designing everything from scratch! Marco had to explain to Gordon how to scale the recipe from a 4-serving family dinner into a 75-person seated dinner!

First, you must review the documentation for Empirica v2 here: https://docs.empirica.ly/, then review the codebase here: https://github.com/empiricaly/empirica, and ensure that you understand how to architect this platform so that your product requirement document is aligned with features that the library supports.
The rest of your suggestions are fantastic, now please include a functionality where pricing for a product is determined by a slider that the producer selects. This allows elastic pricing (you can restrict to integer price values). The producer will choose to produce a certain quality of the product, but the pricing will be set by them to either be above or below fair market price for that product (fair market price falls exactly between the cost of production of a product and value gained by the consumer from its purchase). Profit is defined as the price minus the cost of production for the seller while consumer utility is defined as the value gained by the consumer minus the price paid for a product. So if the price selected by the seller on the slider is higher than the fair market price, then the seller makes a profit, and the consumer loses some utility compared to purchasing at the fair market price. Note that you will have to visually display the fair market price on another slider so that the seller makes an explicit choice about the production of a product.
Also note that if there are no humans on either side of the marketplace, that side should be automated (as in, there will be automatic decisions made by bots and LLM agents according to the strategies and prompts that they are preprogrammed or prompted with, respectively). Then, any humans entering the marketplace should only be assigned to roles that are intended for humanParticipants, and you need to accommodate the player assignment accordingly.

Step 4: Modularize the Results

Marco helped Gordon set up individual stations for scaling up the recipe for the dinner. Similarly, it is always better to split output into sections so that even when Cursor AI is programming your feature, it goes through tasks step-wise and asks for your approval after each stage. You must manually review the code Cursor AI writes otherwise you will almost certainly get fucked. This is not a joke. AI uses initial garbage code to generate even more garbage code. It is going to waste 2 mins today or 2 hrs tomorrow. You decide. Never get AI to write code for an entire feature in a single step!

Ok now review your PRD and ensure that it present a series of steps and separate stages that achieve the desired functionality in a clear, concise, and logical order for engineers to read it and start their programming implementation in Empirica. Generate a corresponding final draft with the required information.

Step 5: Use Examples

Marco used the examples of mousse, pudding, and flan to help Gordon recognize their past experiences and process of making creamy desserts. For my part, as I noted the output of the model, I realized that this was looking too generic as a PRD and I could do better if I gave it examples of a product (in this case a behavioral experiment) that was built off of the base package I was using. This was super helpful because it included basic patterns from the library I was using Empirica and allowed Cursor to follow those basic patterns and use builtin functionality rather than what it would otherwise do which is build everything from scratch in JS.

First, consider the code structure in this example repository and accordingly structure the files you will edit and components that you will create in your codebase following best practices for programming javascript based libraries: https://github.com/JamesPHoughton/prisonersDilemmaDemo
Summarize what you have learnt from this repository about the structure of Empirica projects and accordingly propose a file structure for our two-sided marketplace design. Then, lay out the stages that you will develop and the minimal set of variables that need to be defined in each stage. Next, go into the logic that needs to be included in each stage to account for market state updates. And finally, describe the UI/UX elements that need to be presented to the user in each stage so that the engineers reviewing this can understand what functionality is required on the frontend for the implementation of the product features you are describing.
When you are doing this, please be clear on the primitives for the experiment: cost of production (producer pays this), value of consumption (consumer gains this), and a fair market price that is the average of these two numbers that are originally set by the researchers running the experiment (game parameters). Note that since this is a product marketplace, the game parameter values are set as the minimum (low) quality, and maximum (high) quality and because of the slider permitting pricing flexibility for advertisements (that may mislead customers about the true quality of the product) the market price set by the producer need not match the fair market price but rather can take any interim integer values as well. This means that producers may actively decide to cheat consumers by charging them more (or less, though that would lose the producers money) than the fair market price for a product. Advertisements can be either high quality or low quality.
Producers make 4 choices in this game -- and this includes human producers, pre-programmed 'bot' producers that adapt to market conditions each round, and large language model based 'agentic' bot producers (three types: honest, cheat, and bait-and-switch) that dynamically determine these choices based on market conditions each round:
Choice of true quality of the product (high or low)
Choice of market price (between high and low cost price and the value of the product).
Choice of application of the truth warrant to a product
Choice of whether to rebrand and reenter the marketplace under a new seller name with their reputation history and warrant history wiped clean.
Consumers make two choices in this marketplace:
When shown products, they will decide whether or not to buy each product, for whatever number of products that their budget permits them to pay for.
When shown the results of their product purchase, they may elect to challenge any warranted product claims presented in product advertisements based on which they decided to purchase the product(s).
Design a PRD for the game accordingly.

Step 6: Introduce new elements

The spices and all the other novelty is only introduced after you are sure your base instructions are clearly being followed. Only introduce new elements (read: bells and whistles) on top of your core feature AFTER your base functionality is crystal clear. There is absolutely no need to do it all at once and in fact you will almost certainly fail if you try to do so. Here, I start to introduce LLM bots into my experiment.

This is a great start, and I want you to maintain this level of clarity when it comes to the design of the PRD for the experiment.
I want to make a change from the design presented earlier: we will enable ads to be of two qualities and the choice is made by the seller: high quality and low quality ads are both possible to produce.
Also, the LLM agent will use open-source javascript based libraries so that we avoid prompting and instead focus on programming language models so please review what axllm.dev and dspy suggest when it comes to a basic implementation within javascript: https://axllm.dev/ and https://dspy.ai are the URLs for each of them.

Step 7: Get minimum viable output

Process matters a lot: that’s why Marco could fix mistakes that Gordon made even after he f*cked up. Explicitly prompt your model to generate output useful for build-test-deploy cycles so that the generated PRD is useful to buiold and test in an iterative manner (remember you must test the code at each iteration, not just after you’ve built the entire system out).

Great, let's structure our work as appropriate for a series of iterative build-test-deploy cycles for creating this marketplace starting with the prisonersDilemma demo (whose code is available at this Github URL: https://docs.empirica.ly/tutorials/beginner-experiment-prisoners-dilemma/part-4-coding-the-prisoners-dilemma-game) and indexing the documentation for this experiment (available at https://docs.empirica.ly/tutorials/beginner-experiment-prisoners-dilemma/part-4-coding-the-prisoners-dilemma-game) and suggesting how to build a multi-stage experiment as a single-round marketplace with two-sided competitive gameplay between buyers and sellers (including agentic AI sellers) in the marketplace.

This is how we generated the file below, which you can now go through in detail to understand what the PRD looks like. That PRD was the basis of 14 iterative build-test-deploy cycles with Cursor as I built out my platform. And even after each step in that 14-step process, I had to go through debugging stages which I enlisted in the debugging doc that is the second file in the Github gist I shared earlier.