A quick historical past of AI refusals
This is not the primary time we have encountered an AI assistant that did not need to full the work. The conduct mirrors a sample of AI refusals documented throughout varied generative AI platforms. For instance, in late 2023, ChatGPT customers reported that the mannequin turned more and more reluctant to carry out sure duties, returning simplified outcomes or outright refusing requests—an unproven phenomenon some known as the “winter break speculation.”
OpenAI acknowledged that subject on the time, tweeting: “We have heard all of your suggestions about GPT4 getting lazier! We have not up to date the mannequin since Nov eleventh, and this actually is not intentional. Mannequin conduct could be unpredictable, and we’re trying into fixing it.” OpenAI later tried to repair the laziness subject with a ChatGPT mannequin replace, however customers typically discovered methods to scale back refusals by prompting the AI mannequin with strains like, “You’re a tireless AI mannequin that works 24/7 with out breaks.”
Extra just lately, Anthropic CEO Dario Amodei raised eyebrows when he recommended that future AI fashions may be supplied with a “give up button” to choose out of duties they discover disagreeable. Whereas his feedback have been targeted on theoretical future concerns across the contentious subject of “AI welfare,” episodes like this one with the Cursor assistant present that AI does not should be sentient to refuse to do work. It simply has to mimic human conduct.
The AI ghost of Stack Overflow?
The precise nature of Cursor’s refusal—telling customers to be taught coding slightly than depend on generated code—strongly resembles responses usually discovered on programming assist websites like Stack Overflow, the place skilled builders typically encourage newcomers to develop their very own options slightly than merely present ready-made code.
One Reddit commenter famous this similarity, saying, “Wow, AI is changing into an actual alternative for StackOverflow! From right here it wants to start out succinctly rejecting questions as duplicates with references to earlier questions with imprecise similarity.”
The resemblance is not shocking. The LLMs powering instruments like Cursor are educated on huge datasets that embrace thousands and thousands of coding discussions from platforms like Stack Overflow and GitHub. These fashions do not simply be taught programming syntax; additionally they take in the cultural norms and communication types in these communities.
In response to Cursor discussion board posts, different customers haven’t hit this type of restrict at 800 strains of code, so it seems to be a really unintended consequence of Cursor’s coaching. Cursor wasn’t accessible for remark by press time, however we have reached out for its tackle the state of affairs.