OpenAI has had a big year, leading the generative AI race with ChatGPT. The success of it means that all eyes are on the company to set the appropriate precedent for future AI developments, and OpenAI has taken one step forward with a new safety plan.
Also: With AI upgrade, Salesforce’s Einstein Copilot will handle unstructured data
This week, OpenAI published the initial beta version of its Preparedness Framework, a safety plan delineating the different precautions the company has put in place to ensure the safety of its frontier AI models.
In the first element of the framework, the company commits to running consistent evaluations on its frontier models that push the models to their limits. OpenAI claims that these findings will help the company assess the risk of the models and measure the effectiveness of proposed mitigations.
The evaluations’ findings will then be shown in risk “scorecards” for OpenAI’s frontier models, continually updated to reflect risk thresholds, including cybersecurity, persuasion, model autonomy, and CBRN (chemical, biological, radiological, and nuclear threats), as seen in the image below.
The risk thresholds will be classified into four risk safety levels: low, medium, high, and critical. That score will then determine how the company should proceed with the model.
Models that earn a post-mitigation score of “medium” or below can be deployed, while only models with a post-mitigation score of “high” or below can be developed further, according to the post.
Also: AI adds new fuel to autonomous enterprises, but don’t write off humans
OpenAI is also restructuring how the teams internally operate in making decisions.
A dedicated Preparedness team will drive technical work to evaluate the frontier model’s capabilities, such as running evaluations and synthesizing reports. Then, a cross-functional Safety Advisory Group will review all the reports and send them to Leadership and the Board of Directors.
Lastly, leadership will remain in its position as the decision-maker; however, the Board of Directors will hold the right to reverse decisions.
This addition is particularly noteworthy because it follows the turmoil that ensued early last month when Sam Altman was briefly ousted by the Board of Directors, only to be promptly reinstated as CEO with a new board.
Other framework elements include developing a protocol for added safety and outside accountability, collaborating with external parties and internal teams to track real-world misuse, and pioneering new research in measuring how risk evolves as models scale, according to the release.
Artificial Intelligence