OpenAI's New OPEN Models - GPT-OSS 120B & 20B
TLDROpenAI has released two new open weights models, GPT-OSS-120B and GPT-OSS-20B, licensed under Apache 2.0. These models are designed for cloud and local use, respectively, and are trained similarly to GPT-3 and GPT-4, with reinforcement techniques and instruction tuning. They support three levels of reasoning effort, balancing latency and performance. While the models are English-only and not fully open source, they offer promising capabilities for agentic workflows, including tool use and web search. The 20B model can run locally with tools like Olama, and both models show strong performance in function calling benchmarks. The release puts pressure on other labs to open up more models.
Takeaways
- 😀 OpenAI has released two new open weights models: GPT-OSS 120B and GPT-OSS 20B, licensed under Apache 2.0.
- 😀 The models are not truly open source since they only provide the instruction-tuned models, not the base models, training code, checkpoints, or data.
- 😀 The 120B model is designed for cloud deployment with significant GPU power, while the 20B model can be run locally on personal computers.
- 😀 Both models support three levels of reasoning effort (low, medium, high), allowing a trade-off between latency and performance.
- 😀 The models are trained using reinforcement learning techniques similar to those used for GPT-3 and GPT-4, and are designed for agentic workflows including instruction following and tool use.
- 😀 The models use a mixture of experts (MoE) architecture, with the 120B model running on 5 billion active parameters and the 20B model on 3.6 billion active parameters.
- 😀 The models support a context length of up to 128K tokens, which is a significant improvement over previous models.
- 😀 The models are primarily English-focused, similar to many initial models from other labs.
- 😀 The models show strong performance in function calling benchmarks and reasoning tasks, with the 20B model outperforming some larger proprietary models.
- 😀 The models can be accessed through Open Router, and can be run locally using frameworks like Olama with appropriate quantization support.
- 😀 The release of these models puts pressure on other labs to release more open models, and it will be interesting to see how they compare to upcoming proprietary models like GPT-5.
Q & A
What are the two new OpenAI models mentioned in the script?
-The two new OpenAI models mentioned are the GPT-OSS 120B and the GPT-OSS 20B models.
What is the license under which these OpenAI OSS models are released?
-The OpenAI OSS models are released under the Apache 2.0 license.
Why does the speaker question the use of the term 'OSS' for these models?
-The speaker questions the use of the term 'OSS' (Open Source Series) because these models are only open in terms of their weights being available. They do not provide the full open-source experience, such as access to the base models, training code, checkpoints, or data, which would make them truly reproducible.
What is the significance of the 120B and 20B model sizes?
-The 120B model is designed for cloud-based or GPU-intensive use cases, while the 20B model is intended for local use on personal computers with tools like Olama and LM Studio. This allows users to run the models in different environments based on their needs.
What reasoning capabilities do the GPT-OSS models support?
-Both the 20B and 120B models support three levels of reasoning effort: low, medium, and high. This allows users to trade off between latency and performance based on their specific requirements.
How do these models compare to other models like GPT-3 and GPT-4?
-The 120B model is compared to GPT-4 mini, while the 20B model is compared to GPT-3 mini. These comparisons show that the new models perform impressively, especially in terms of instruction following and tool use.
What is the knowledge cutoff date for these models?
-The knowledge cutoff date for the models is June 2024, indicating that they were trained on data up to that point.
What are some potential use cases for these models?
-The models are designed for agentic workflows, including instruction following, tool use, web search, Python code execution, and reasoning abilities. They can be used for a variety of applications, from local agents to cloud-based services.
How can users interact with these models?
-Users can interact with the models through Open Router, using the native API or chat completions endpoint. They can also run the models locally using tools like Olama, provided they have sufficient computational resources.
What are some limitations of these models?
-Some limitations include the models being primarily English-only and having a knowledge cutoff date from June 2024. Additionally, the models do not provide full open-source access, such as training code or data, which limits their reproducibility.
What is the speaker's overall opinion on the release of these models?
-The speaker views the release of these models as a positive step for OpenAI, especially after a long gap since GPT-2. However, they criticize the naming and the lack of full open-source access. They also express curiosity about how these models will compare to upcoming proprietary models like GPT-5.
Outlines
😀 Introduction to OpenAI's New Open Weight Models
The video script begins with an introduction to OpenAI's newly released open weight models. The speaker expresses a balanced approach to reviewing the models, avoiding hyperbole and instead focusing on a critical analysis of their strengths and weaknesses. The script highlights the release of two models: a 120 billion parameter model and a 20 billion parameter model. The speaker notes discrepancies in the parameter counts and discusses the significance of these models being released under an Apache 2.0 license, emphasizing their openness compared to previous models. The script also touches on the models' compatibility with agentic workflows and their potential for local deployment, contrasting them with other large models that are too resource-intensive for local use. The speaker critiques the naming of the models as 'open source,' arguing that they are more accurately described as 'open weight' models since they do not include full source code or training data.
😀 Technical Details and Model Architecture
The second paragraph delves into the technical aspects of the models. The script explains that the models have been trained using reinforcement techniques and instruction tuning, similar to OpenAI's proprietary models. The speaker highlights the deliberate choice of model sizes to cater to both cloud-based and local deployment scenarios. The 120B model is compared to O4 mini, while the 20B model is compared to O3 mini. The script discusses the models' ability to support different levels of reasoning effort, which can be adjusted through system prompts. The speaker also mentions the models' use of rotary positional embeddings and their context length capabilities. The paragraph concludes with a critique of the models' English-only nature and a discussion of the post-training process, noting the lack of detailed information about the training techniques used.
😀 Model Performance and Benchmark Analysis
The third paragraph focuses on the performance and benchmark results of the models. The script notes that the models have been compared only to OpenAI's own models, suggesting that future comparisons with other models will provide a clearer picture of their capabilities. The speaker highlights the models' performance in various benchmarks, including the humanities' last exam and function calling benchmarks, where they show promising results. The script also discusses the models' reasoning abilities and the impact of different reasoning efforts on their performance. The speaker raises questions about potential overfitting on benchmarks and emphasizes the importance of generalizability over specific benchmark scores.
😀 Practical Usage and Deployment Options
The fourth paragraph explores practical ways to use and deploy the models. The script outlines different methods for accessing the models, including through Open Router and using the native API with reasoning capabilities. The speaker demonstrates how to set up and use the models in a local environment, emphasizing the importance of using Triton for efficient quantization. The script also discusses the models' compatibility with the Harmony SDK and their ability to handle different roles and prompts. The speaker notes the models' knowledge cutoff date and its implications for up-to-date information. The paragraph concludes with a discussion of the models' performance in local deployment scenarios, highlighting their capabilities and limitations.
😀 Overall Impression and Future Outlook
The final paragraph provides an overall assessment of the models and their implications for the future. The speaker acknowledges the models' strengths, particularly in agentic workflows and function calling, while also noting areas for improvement. The script reflects on the models' release as a step in the right direction for OpenAI, putting pressure on other labs to release more open models. The speaker speculates about the upcoming GPT5 launch and its potential impact on the adoption of these open models. The paragraph concludes with an invitation for viewers to share their thoughts and experiences with the models, highlighting the ongoing nature of testing and evaluation.
Mindmap
Keywords
💡OpenAI
💡Open Weights Models
💡Apache 2.0 License
💡GPT-OSS
💡Reinforcement Learning (RL)
💡Agentic Workflows
💡Reasoning Effort
💡Mixture of Experts (MoE)
💡Quantization
💡Knowledge Cutoff
Highlights
OpenAI has released two new open weights models: GPT-OSS 120B and 20B.
The models are licensed under Apache 2.0, allowing broad usage without restrictive conditions.
Despite being called 'open source,' the models are more accurately described as 'open weight' models.
The models are designed for agentic workflows, supporting instruction following, tool use, web search, and Python code execution.
Both models support three levels of reasoning effort (low, medium, high) to balance latency and performance.
The 120B model runs with 5 billion active parameters, while the 20B model runs with 3.6 billion active parameters.
The models use rotary positional embeddings, allowing for a context length of up to 128K.
The models are primarily English-only, similar to other initial releases from labs.
The models' post-training process is similar to O4 mini, including supervised fine-tuning and reinforcement learning.
Benchmarks show the models performing well, especially in function calling and reasoning tasks.
The models can be accessed via Open Router, with options for different providers and reasoning settings.
Running the models locally requires tools like Olama and Triton for efficient quantization.
The models support the Harmony SDK, which simplifies interaction with the chat API.
The knowledge cutoff for the models is June 2024, indicating a year-old dataset.
The release puts pressure on other labs to release more open models, especially in the West.
The models' release is timely, just days before the anticipated GPT-5 launch.