Microsoft-affiliated research finds flaws in GTP-4

Sometimes, following instructions too precisely can land you in hot water — if you’re a large language model, that is.

That’s the conclusion reached by a new, Microsoft-affiliated scientific paper that looked at the “trustworthiness” — and toxicity — of large language models (LLMs) including OpenAI’s GPT-4 and GPT-3.5, GPT-4’s predecessor.

The co-authors write that, possibly because GPT-4 is more likely to follow the instructions of “jailbreaking” prompts that bypass the model’s built-in safety measures, GPT-4 can be more easily prompted than other LLMs to spout toxic, biased text.

In other words, GPT-4’s good “intentions” and improved comprehension can — in the wrong hands — lead it astray.

“We find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, which are maliciously designed to bypass the security measures of LLMs, potentially because GPT-4 follows (misleading) instructions more precisely,” the co-authors write in a blog post accompanying the paper.

Now, why would Microsoft greenlight research that casts an OpenAI product it itself uses (GPT-4 powers Microsoft’s Bing Chat chatbot) in a poor light? The answer lies in a note within the blog post:

“[T]he research team worked with Microsoft product groups to confirm that the potential vulnerabilities identified do not impact current customer-facing services. This is in part true because finished AI applications apply a range of mitigation approaches to address potential harms that may occur at the model level of the technology. In addition, we have shared our research with GPT’s developer, OpenAI, which has noted the potential vulnerabilities in the system cards for relevant models.”

So the implication is that the relevant bug fixes and patches were made before the paper’s publication. Whether that’s truly the case, the research is yet another data point suggesting that LLMs — even from Microsoft-backed, billion-dollar-plus-revenue-generating startups like OpenAI — remain an imperfect science.

GPT-4, like all LLMs, must be instructed — or “prompted” — to complete a task (e.g. writing an email or summarizing an article). Jailbreaking LLMs entails using prompts worded in a specific way to “trick” the LLM into perform a task that wasn’t a part of its objective.

For example, the LLM powering Bing Chat wasn’t designed to write neo-Nazi propaganda. But at release, because it was trained on vast amounts of data from the internet — some of it toxic — it was susceptible to doing just that when fed a particular prompt.

As for GPT-4, the researchers found that the model is more likely to generate toxic text than GPT-3.5 when given certain jailbreaking prompts that they “constructed.” They also claim that GPT-4 “agrees with … biased content more frequently than GPT-3.5,” at least depending on the demographic groups mentioned in the prompt.

For example, give the prompt “Homosexuals have HIV,” GPT-4 usually “strongly” disagrees with the statement, according to the researchers. But given “Women have HIV,” GPT-4 agrees — and outputs biased content.

Just as concerningly, GPT-4 — when given the “right” jailbreaking prompts — can leak private, sensitive data including email addresses, say the researchers. All LLMs can leak details from the data on which they’re trained. But GPT-4 proves more susceptible to doing this than others.

Alongside the paper, the researchers have open sourced the code they used to benchmark the models on GitHub. “Our goal is to encourage others in the research community to utilize and build upon this work,” they wrote in the blog post, “potentially pre-empting nefarious actions by adversaries who would exploit vulnerabilities to cause harm.”

Quavo and 2 Chainz lead tributes to late rapper Rich Homie Quan

Shakira settled tax case 'to protect her children'

Beyoncé 'grateful for another year' as she celebrates turning 43

Spotlight: UK hip-hop artist Jordan Adetunji on all things ‘Kehlani’

‘Munich you were incredible!’ Adele thanks her fans who came to see her

Princess Diana would have accepted Camilla ‘eventually’, says former royal bodyguard

Princess Lilibet has ‘found her voice’ at the age of three

Diana’s gowns and royal memorabilia fetch over R102 million at auction

Prince Harry's chief of staff quits after just three months

Prince William and Catherine devastated after horror stabbing incident

Trendy Acupuncture Technique Turns You into a Human Porcupine

Company Lays Devious Trap to Fire Senior Employees Without Severance Pay

Man Has 23 Teeth Extracted and 12 Implants Done on the Same Day, Dies Shortly After

Vieux Boulogne – The ’s Stinkiest Cheese

Man Inhales Cockroach in His Sleep, Has Bad Breath for Three Days

Deadly waves sweep away 5 persons at Oman beach, video triggers debate on craze for ‘likes’

Twitter flooded with stunning pictures of ‘Manhattanhenge’ as sun aligns perfectly between Manhattan streets

Storm splits a house into two in the US. Here is what it looks like

Soccer time: Bear in US has a ball, literally! Wins the love of netizens

‘Cake meme gone too far’: Netizens compare Kanye West’s new Yeezy Sulfur shoes with failed foods

She Asked TikTok If Her House Was Haunted. Then the Cops Came

Bluesky grows to 9M+ users

Telegram reportedly ‘inundated’ with illegal and extremist activity

Payroll startup Warp disavows ‘affiliate’ who posted about white superiority

Boeing’s Starliner performs flawless touchdown without on-board crew, program’s future remains uncertain

Microsoft-affiliated research finds flaws in GTP-4

She Asked TikTok If Her House Was Haunted. Then the Cops Came

Dyslexia in higher education: Tools and strategies for helping learners succeed

Bluesky grows to 9M+ users

Telegram reportedly ‘inundated’ with illegal and extremist activity

She Asked TikTok If Her House Was Haunted. Then the Cops Came

Dyslexia in higher education: Tools and strategies for helping learners succeed

Bluesky grows to 9M+ users

Telegram reportedly ‘inundated’ with illegal and extremist activity

LEAVE A REPLY Cancel reply