OpenAI's New OPEN Models - GPT-OSS 120B & 20B

Sam Witteveen
6 Aug 202521:52

TLDROpenAI has released two new open weights models, GPT-OSS-120B and GPT-OSS-20B, licensed under Apache 2.0. These models are designed for cloud and local use, respectively, and are trained similarly to GPT-3 and GPT-4, with reinforcement techniques and instruction tuning. They support three levels of reasoning effort, balancing latency and performance. While the models are English-only and not fully open source, they offer promising capabilities for agentic workflows, including tool use and web search. The 20B model can run locally with tools like Olama, and both models show strong performance in function calling benchmarks. The release puts pressure on other labs to open up more models.

Takeaways

  • 😀 OpenAI has released two new open weights models: GPT-OSS 120B and GPT-OSS 20B, licensed under Apache 2.0.
  • 😀 The models are not truly open source since they only provide the instruction-tuned models, not the base models, training code, checkpoints, or data.
  • 😀 The 120B model is designed for cloud deployment with significant GPU power, while the 20B model can be run locally on personal computers.
  • 😀 Both models support three levels of reasoning effort (low, medium, high), allowing a trade-off between latency and performance.
  • 😀 The models are trained using reinforcement learning techniques similar to those used for GPT-3 and GPT-4, and are designed for agentic workflows including instruction following and tool use.
  • 😀 The models use a mixture of experts (MoE) architecture, with the 120B model running on 5 billion active parameters and the 20B model on 3.6 billion active parameters.
  • 😀 The models support a context length of up to 128K tokens, which is a significant improvement over previous models.
  • 😀 The models are primarily English-focused, similar to many initial models from other labs.
  • 😀 The models show strong performance in function calling benchmarks and reasoning tasks, with the 20B model outperforming some larger proprietary models.
  • 😀 The models can be accessed through Open Router, and can be run locally using frameworks like Olama with appropriate quantization support.
  • 😀 The release of these models puts pressure on other labs to release more open models, and it will be interesting to see how they compare to upcoming proprietary models like GPT-5.

Q & A

  • What are the two new OpenAI models mentioned in the script?

    -The two new OpenAI models mentioned are the GPT-OSS 120B and the GPT-OSS 20B models.

  • What is the license under which these OpenAI OSS models are released?

    -The OpenAI OSS models are released under the Apache 2.0 license.

  • Why does the speaker question the use of the term 'OSS' for these models?

    -The speaker questions the use of the term 'OSS' (Open Source Series) because these models are only open in terms of their weights being available. They do not provide the full open-source experience, such as access to the base models, training code, checkpoints, or data, which would make them truly reproducible.

  • What is the significance of the 120B and 20B model sizes?

    -The 120B model is designed for cloud-based or GPU-intensive use cases, while the 20B model is intended for local use on personal computers with tools like Olama and LM Studio. This allows users to run the models in different environments based on their needs.

  • What reasoning capabilities do the GPT-OSS models support?

    -Both the 20B and 120B models support three levels of reasoning effort: low, medium, and high. This allows users to trade off between latency and performance based on their specific requirements.

  • How do these models compare to other models like GPT-3 and GPT-4?

    -The 120B model is compared to GPT-4 mini, while the 20B model is compared to GPT-3 mini. These comparisons show that the new models perform impressively, especially in terms of instruction following and tool use.

  • What is the knowledge cutoff date for these models?

    -The knowledge cutoff date for the models is June 2024, indicating that they were trained on data up to that point.

  • What are some potential use cases for these models?

    -The models are designed for agentic workflows, including instruction following, tool use, web search, Python code execution, and reasoning abilities. They can be used for a variety of applications, from local agents to cloud-based services.

  • How can users interact with these models?

    -Users can interact with the models through Open Router, using the native API or chat completions endpoint. They can also run the models locally using tools like Olama, provided they have sufficient computational resources.

  • What are some limitations of these models?

    -Some limitations include the models being primarily English-only and having a knowledge cutoff date from June 2024. Additionally, the models do not provide full open-source access, such as training code or data, which limits their reproducibility.

  • What is the speaker's overall opinion on the release of these models?

    -The speaker views the release of these models as a positive step for OpenAI, especially after a long gap since GPT-2. However, they criticize the naming and the lack of full open-source access. They also express curiosity about how these models will compare to upcoming proprietary models like GPT-5.

Outlines

00:00

😀 Introduction to OpenAI's New Open Weight Models

The video script begins with an introduction to OpenAI's newly released open weight models. The speaker expresses a balanced approach to reviewing the models, avoiding hyperbole and instead focusing on a critical analysis of their strengths and weaknesses. The script highlights the release of two models: a 120 billion parameter model and a 20 billion parameter model. The speaker notes discrepancies in the parameter counts and discusses the significance of these models being released under an Apache 2.0 license, emphasizing their openness compared to previous models. The script also touches on the models' compatibility with agentic workflows and their potential for local deployment, contrasting them with other large models that are too resource-intensive for local use. The speaker critiques the naming of the models as 'open source,' arguing that they are more accurately described as 'open weight' models since they do not include full source code or training data.

05:01

😀 Technical Details and Model Architecture

The second paragraph delves into the technical aspects of the models. The script explains that the models have been trained using reinforcement techniques and instruction tuning, similar to OpenAI's proprietary models. The speaker highlights the deliberate choice of model sizes to cater to both cloud-based and local deployment scenarios. The 120B model is compared to O4 mini, while the 20B model is compared to O3 mini. The script discusses the models' ability to support different levels of reasoning effort, which can be adjusted through system prompts. The speaker also mentions the models' use of rotary positional embeddings and their context length capabilities. The paragraph concludes with a critique of the models' English-only nature and a discussion of the post-training process, noting the lack of detailed information about the training techniques used.

10:01

😀 Model Performance and Benchmark Analysis

The third paragraph focuses on the performance and benchmark results of the models. The script notes that the models have been compared only to OpenAI's own models, suggesting that future comparisons with other models will provide a clearer picture of their capabilities. The speaker highlights the models' performance in various benchmarks, including the humanities' last exam and function calling benchmarks, where they show promising results. The script also discusses the models' reasoning abilities and the impact of different reasoning efforts on their performance. The speaker raises questions about potential overfitting on benchmarks and emphasizes the importance of generalizability over specific benchmark scores.

15:03

😀 Practical Usage and Deployment Options

The fourth paragraph explores practical ways to use and deploy the models. The script outlines different methods for accessing the models, including through Open Router and using the native API with reasoning capabilities. The speaker demonstrates how to set up and use the models in a local environment, emphasizing the importance of using Triton for efficient quantization. The script also discusses the models' compatibility with the Harmony SDK and their ability to handle different roles and prompts. The speaker notes the models' knowledge cutoff date and its implications for up-to-date information. The paragraph concludes with a discussion of the models' performance in local deployment scenarios, highlighting their capabilities and limitations.

20:04

😀 Overall Impression and Future Outlook

The final paragraph provides an overall assessment of the models and their implications for the future. The speaker acknowledges the models' strengths, particularly in agentic workflows and function calling, while also noting areas for improvement. The script reflects on the models' release as a step in the right direction for OpenAI, putting pressure on other labs to release more open models. The speaker speculates about the upcoming GPT5 launch and its potential impact on the adoption of these open models. The paragraph concludes with an invitation for viewers to share their thoughts and experiences with the models, highlighting the ongoing nature of testing and evaluation.

Mindmap

Keywords

💡OpenAI

OpenAI is an artificial intelligence research laboratory that focuses on developing advanced AI models. In the context of this video, OpenAI is the organization that has released new open weights models, which is a significant event in the AI community. The video discusses OpenAI's history of releasing models, such as GPT-2, and how these new models fit into their overall strategy. For example, the script mentions that OpenAI has been criticized for not releasing open LLMs (Large Language Models) after GPT-2, highlighting the importance of this new release.

💡Open Weights Models

Open weights models refer to AI models whose weights (the parameters learned during training) are made publicly available. In the video, the release of OpenAI's open weights models is a key topic. These models allow developers to use the pre-trained weights for various applications, which is a significant step towards more open and accessible AI. The script mentions the 120 billion parameter model and the 20 billion parameter model, both of which are open weights models that can be used freely under the Apache 2.0 license.

💡Apache 2.0 License

The Apache 2.0 license is a permissive open-source software license that allows users to use, modify, and distribute the licensed software with minimal restrictions. In the context of the video, the new OpenAI models are released under this license, which means that developers can use these models for a wide range of applications without worrying about restrictive licensing conditions. The script highlights this as a positive aspect of the release, emphasizing the freedom it provides to the users.

💡GPT-OSS

GPT-OSS is the name given to the new open weights models released by OpenAI. The term 'OSS' stands for 'Open Source Series,' although the video script questions whether this is truly an open-source model since only the weights are provided, not the full source code or training data. The script mentions that the name might have been a bit misleading, as even OpenAI's own models seemed unsure about the meaning of 'OSS' initially, suggesting it might stand for 'One-Stop Shop' instead.

💡Reinforcement Learning (RL)

Reinforcement learning is a type of machine learning where an AI agent learns to make decisions by performing actions in an environment to maximize some notion of cumulative reward. In the context of the video, the script mentions that the new OpenAI models have been trained using reinforcement learning techniques, similar to their proprietary models like GPT-3 and GPT-4. This training method helps the models perform better in tasks such as instruction following and tool use, which are important for agentic workflows.

💡Agentic Workflows

Agentic workflows refer to the use of AI models in a way that they can act as agents, performing tasks autonomously or semi-autonomously. In the video, the new OpenAI models are designed to support agentic workflows, meaning they can be used in applications where the model needs to interact with its environment, follow instructions, and use tools. The script mentions that these models are specifically post-trained for tasks like web search, Python code execution, and reasoning abilities, which are essential for agentic applications.

💡Reasoning Effort

Reasoning effort refers to the level of cognitive processing a model uses to generate a response. In the context of the video, the new OpenAI models support three levels of reasoning effort: low, medium, and high. This allows users to trade off between latency (how quickly the model responds) and performance (how thorough the response is). The script mentions that users can set the reasoning effort using the system prompt, which will affect how much the model thinks before providing an answer.

💡Mixture of Experts (MoE)

A Mixture of Experts (MoE) model is a type of neural network architecture that combines the predictions of multiple 'expert' models to improve overall performance. In the video, the script mentions that both the 120B and 20B models are MoE models, which is a common approach in modern large language models. This architecture allows the models to handle different types of tasks more efficiently by leveraging the strengths of multiple experts within the model.

💡Quantization

Quantization is a technique used in machine learning to reduce the precision of the numbers used to represent the model's weights, which can significantly reduce the model's size and improve its speed. In the context of the video, the script mentions that the new OpenAI models use 4-bit floating-point quantization, which allows them to be run more efficiently on hardware like GPUs. This is important for deploying large models locally or in the cloud without requiring excessive computational resources.

💡Knowledge Cutoff

Knowledge cutoff refers to the date up to which a model's training data includes information. In the video, the script mentions that the knowledge cutoff for the new OpenAI models is June 2024. This means that the models do not have knowledge of events or information that occurred after this date. The script highlights this by noting that the model incorrectly lists Joe Biden as the current president in 2025, indicating that its knowledge is outdated.

Highlights

OpenAI has released two new open weights models: GPT-OSS 120B and 20B.

The models are licensed under Apache 2.0, allowing broad usage without restrictive conditions.

Despite being called 'open source,' the models are more accurately described as 'open weight' models.

The models are designed for agentic workflows, supporting instruction following, tool use, web search, and Python code execution.

Both models support three levels of reasoning effort (low, medium, high) to balance latency and performance.

The 120B model runs with 5 billion active parameters, while the 20B model runs with 3.6 billion active parameters.

The models use rotary positional embeddings, allowing for a context length of up to 128K.

The models are primarily English-only, similar to other initial releases from labs.

The models' post-training process is similar to O4 mini, including supervised fine-tuning and reinforcement learning.

Benchmarks show the models performing well, especially in function calling and reasoning tasks.

The models can be accessed via Open Router, with options for different providers and reasoning settings.

Running the models locally requires tools like Olama and Triton for efficient quantization.

The models support the Harmony SDK, which simplifies interaction with the chat API.

The knowledge cutoff for the models is June 2024, indicating a year-old dataset.

The release puts pressure on other labs to release more open models, especially in the West.

The models' release is timely, just days before the anticipated GPT-5 launch.