Imagine you ask AI/LLM how to do a particular task by describing what you want to accomplish in code comments.
1 | // this is a Cypress end-to-end test |
You fire off Cursor or GitHub Copilot... and it might give you a good code suggestion or it might suggest absolute nonsense. Let's take this answer that used Claude Sonnet 3.5 model
For easy search, the suggestion used an alias to the old text
1 | it('changes the label after the click', () => { |
Notice two things: the solution adopts the variable names and data from my prompt; it used the #foo
and #bar
selectors. Good. But the solution has a subtle bug in how it uses an aliased value that might make this test flakey depending on the timer, see Text Changes and the video below for details:
Can we improve the answer given by LLM?
Give good examples
Humans can do a lot following an example. This is why 11 years ago I suggested putting example comments in your source code. A similar approach works with AI generation: prefix your question (prompt) with good examples showing how to achieve the desired outcome. For example, you can manually post the above recipe Text Changes and then ask the LLM:
Ok, what does LLM do now? It follows the good example!
Even older models can generate high-quality code when provided good, well-tested, trustworthy examples and information.
Trustworthy information
Unfortunately, the world wild web cannot be trusted to have accurate information. As someone who is reading a lot of blog posts, I notice a lot of incorrect examples and solutions that are missing important context, have hidden bugs, etc. LLMs trained across wider and wider swaths of Internet do not know which information is accurate, and which information is just some JavaScript snippet posted on a page.
The sourced of tested, accurate, up-to-date coding knowledge should be a premium. I am maintaining a few such knowledge databases for Cypress end-to-end tests. For example, Cypress Examples has almost 1000 constantly tested Cypress tests covering all cy
commands and various testing situations.
Similarly, I have example repos that are constantly tested for my online courses like Cypress Network Testing Examples and Cypress Plugins, etc. Each course has hundreds of lessons, thus the total number of high-quality Cypress tests is close to another 1000. How do we use them to answer any current prompts?
Retrieval-augmented Generation
Easy. Before generating an answer to our current prompt, we find a similar example using semantic code and text search. Then we include the example in the full prompt we send to an LLM. This is what Retrieval (search for example) Augmented (and include it with your prompt) Generation (adopt the example to your current situation) aka RAG is.
We could use a regular Algolia / full-text search to find examples matching the current prompt. Or we could use semantic meaning to quickly find similar examples. Here is one RAG implementation that I played with. It uses ChromaDB to store Markdown documents I prepared. It also can quickly find examples close to new code fragments.
Prepare documents
I use Markdown to store Cypress code examples and even to run them as tests. For retrieval, I extract blocks of examples from Markdown docs using markdown-search-scraper CLI tool and store them in ChromaDB running locally.
1 | import { parseForAi } from 'markdown-search-scraper/src/parse-for-ai.js' |
- parsing Markdown for AI strips the code, but keeps the code comments
- I let ChromaDB prepare an embedding vector from the Markdown file. It produces a long array of numbers like
[0.2, 0.89, ...]
from Markdown text. - I store the original Markdown in the DB as metadata. An alternative implementation would store a link to the original Markdown stores in another database
- it takes a while to insert all 1000 Markdown examples into ChromaDB. I suggest using text hashes to only update examples that changed to save time
Retrieval
Once we want to ask LLM a question, we query ChromaDB to find if it has any documents close to the query text.
1 | import { ChromaClient } from 'chromadb' |
Let's pretend we want to find a code example before asking LLM to implement the test
$ node ./query.mjs "text updated to something else after click"
Result 1 with distance 0.6559537
In this example, we want to confirm that the text on the page changes after the user clicks the button. We do not know the initial text, just know that is changes in response to the click.
<div id="output">Original text</div>
<button id="change">Do it</button>
document
.getElementById('change')
.addEventListener('click', () => {
// change the text, but do it after a random delay,
// almost like the application is loading something from the backend
setTimeout(() => {
document.getElementById('output').innerText = 'Changed!'
}, 1000 + 1000 * Math.random())
})
cy.get('#output')
.invoke('text')
.then((text) => {
cy.get('#change').click()
cy.get('#output').should('not.have.text', text)
})
Watch the explanation video Confirm The Text On The Page Changes After A Click.
See also Counter increments
Result 2 with distance 0.91608334
...
Nice, and notice how it "decided" that works like "text updated" are close in meaning to "the text on the page changes" - the match is NOT exact, but close in semantic meaning. The distance drop off between 0.65 and 0.91 is quite large, so we know the first result is much closer than the second.
Now we can insert the found example into the original LLM prompt and generate a good solution, either manually or via scripting
1 | const examples = await RAG(prompt) |
Tip: ChromaDB can be used with other AI embeddings, see Embedding Integrations.