"Watermarking Language Models" paper and GPTZero EXPLAINED | How to detect text by ChatGPT?

AI Coffee Break with Letitia
31 Jan 202316:05

TLDRThis video discusses methods to detect AI-generated text, focusing on GPTZero and watermarking. GPTZero measures perplexity and burstiness to identify AI text, but it can be fooled. Watermarking involves embedding a unique fingerprint in AI-generated text, detectable through statistical analysis. While watermarking is promising, it requires model creators' cooperation and can be bypassed by certain attacks. The video leaves viewers to ponder the necessity and effectiveness of watermarking in the future of AI-generated content.


  • 👩‍🏫 Ms. Coffee Bean teaches machine learning at a university and is curious about the origins of a student's project proposal.
  • 🤖 ChatGPT's capabilities raise questions about the authenticity of text, leading to the need for methods to detect AI-generated content.
  • 🔍 Two methods for detecting AI-generated text are discussed: GPTZero and watermarking.
  • 🛠️ GPTZero measures perplexity and burstiness to differentiate between human and AI-written text.
  • 📈 Perplexity is computed by predicting the probability of words given the context, with lower probability indicating human writing.
  • 💥 Burstiness measures sentence complexity, with human text showing more variation than AI-generated text.
  • 🚫 GPTZero can be fooled by introducing minor errors like spelling mistakes or grammar errors.
  • 💡 Watermarking involves embedding a unique fingerprint in the text output of language models, detectable through statistical analysis.
  • 🔒 The watermarking process involves blacklisting a random subset of words during the language model's decoding mechanism.
  • 🤖 Attacks on watermarking include word substitutions and the 'emoji attack', which can randomize the blacklist and fool detection.
  • 📋 The effectiveness of watermarking depends on the willingness of model creators to implement it, unlike tools like GPTZero which can be applied universally.

Q & A

  • What is Ms. Coffee Bean's profession outside of YouTube?

    -Ms. Coffee Bean teaches machine learning at her university.

  • What concerns does Ms. Coffee Bean have about a student's project proposal?

    -Ms. Coffee Bean is concerned whether the project proposal was written by the student themselves or if ChatGPT was behind it.

  • What is the main topic of the video?

    -The main topic of the video is explaining two ways of detecting AI-generated text: GPTZero and watermarking.

  • What is Cohere and how does it relate to the video's topic?

    -Cohere is a platform that allows users to utilize advanced language models for text classification and document generation, which is relevant to the video's discussion on AI-generated text.

  • How does GPTZero determine if a text is AI-generated?

    -GPTZero measures perplexity and burstiness of the text, which vary between machine-written and human-written content.

  • What is the significance of perplexity in language models?

    -Perplexity measures how unfamiliar a produced text is for a model; a high probability of a sentence being generated indicates low perplexity, suggesting AI-generated text.

  • What is 'burstiness' and how does it relate to AI-generated text detection?

    -Burstiness measures sentence complexity and variation in word usage; human writing tends to have more burstiness than AI-generated text, which is more constant in sentence structure.

  • What is the concept of watermarking in the context of AI-generated text?

    -Watermarking involves embedding a unique fingerprint into the text output of a language model, which is unnoticeable to humans but detectable with statistical methods.

  • How can watermarking be attacked or fooled?

    -Watermarking can be attacked by brute-forcing the blacklist, reconstructing it with the same random seed, or by using word substitutions and 'emoji attacks' to randomize the blacklisted words.

  • What are the limitations of using watermarking to detect AI-generated text?

    -Watermarking is only effective if the creators of language models choose to implement it, and it may not be applicable to all models, especially without strict regulation.

  • How might the general public feel about the implementation of watermarking in AI language models?

    -The public's opinion on watermarking may vary; some might feel safer knowing their AI interactions are clearly labeled, while others may see it as unnecessary or intrusive.



🤖 Detecting AI-Generated Text

This paragraph introduces the problem of distinguishing between human-written and AI-generated text, specifically mentioning the use of ChatGPT. It presents two methods for detecting AI text: GPTZero, a tool that measures perplexity and burstiness, and the concept of watermarking, which involves embedding a unique fingerprint in AI-generated text. The paragraph also acknowledges the sponsorship of Cohere, a platform for utilizing advanced language models in applications.


📊 GPTZero: Measuring Perplexity and Burstiness

The second paragraph delves into the workings of GPTZero, explaining its reliance on perplexity and burstiness to identify AI-generated content. Perplexity measures the surprise of a language model when presented with a text, while burstiness evaluates sentence complexity. The paragraph discusses the limitations of GPTZero, such as its vulnerability to being fooled by deliberate errors or low complexity in human writing.


💧 Watermarking: A Statistical Fingerprint

This paragraph explains the watermarking method, which involves subtly altering the language model's decoding process to embed a detectable pattern. It describes how watermarking works by blacklisting a random subset of words during text generation, allowing for the detection of AI-generated text by counting these blacklisted words. The paragraph also addresses potential attacks on the watermarking system and its limitations, including the challenge of fooling the algorithm through word substitutions or 'emoji attacks'.


🚫 Watermarking's Limitations and Future

The final paragraph discusses the limitations of watermarking, emphasizing that its effectiveness relies on the willingness of companies to implement it in their language models. It raises the question of whether strict regulations might be necessary for widespread adoption of watermarking. The paragraph concludes by inviting viewers to share their opinions on the necessity of watermarking and closes the video with a call to action for comments and a farewell.



💡AI-generated text

AI-generated text refers to written content that is created by artificial intelligence, specifically language models like ChatGPT. These models can produce human-like text based on patterns and structures learned from vast amounts of data. In the video, the concern is whether a project proposal submitted by a student might have been generated by AI, highlighting the challenge of distinguishing between human and AI authorship.


GPTZero is a tool designed to detect AI-generated text by analyzing properties such as perplexity and burstiness. Perplexity measures how surprising the text is to the AI, with lower perplexity indicating content likely generated by AI. Burstiness assesses sentence complexity and variation, which tends to be more consistent in AI-generated text compared to human writing. GPTZero is used as an example of existing technology aimed at differentiating between human and AI writing.


Watermarking, in the context of AI-generated text, refers to a method where a unique, statistically detectable fingerprint is embedded into the text produced by a language model. This watermark is imperceptible to humans but can be identified through statistical analysis. The goal is to provide a reliable way to ascertain whether text was generated by AI, offering a higher degree of confidence compared to tools like GPTZero.


Cohere is a platform that allows users to integrate advanced language models into their applications for tasks such as text classification and document generation. It simplifies the use of transformer-based models like GPT and BERT, enabling users without machine learning expertise to utilize these powerful tools through easy-to-use APIs.

💡Natural Language Processing (NLP)

Natural Language Processing, or NLP, is a subfield of artificial intelligence that focuses on the interaction between computers and human language. It involves the development of algorithms and computational models that can understand, interpret, and generate human language in a way that is both meaningful and useful. The video touches on the incredible advancements in NLP, which have led to the creation of sophisticated language models capable of generating text that is nearly indistinguishable from that written by humans.


In the context of language modeling, perplexity is a measure of how well a model predicts a sample of text. It quantifies the uncertainty or surprise of the model when presented with a sequence of words. A lower perplexity indicates that the text is more predictable for the model, suggesting it could be AI-generated, while a higher perplexity suggests the text is less predictable and more likely human-written.


Burstiness is a measure of sentence complexity and variation in the use of words. It reflects the tendency of humans to use a mix of common and rare words in their writing, leading to varying sentence lengths and complexity. AI-generated text, on the other hand, often exhibits less variation in sentence structure and word choice, resulting in a more consistent burstiness across sentences.

💡Decoding mechanism

Decoding mechanism refers to the process by which a language model generates text from a probability distribution of words. It involves selecting words to form coherent sentences based on the likelihood of each word following the previous ones. Modern language models use more sophisticated decoding strategies than simply choosing the most probable word at each step, aiming to produce varied and interesting text.

💡Language model

A language model is a computational model that predicts the probability of a sequence of words occurring in a text. It is trained on large datasets of text to learn patterns and structures of human language, and can be used to generate text, translate languages, or perform other natural language processing tasks. The video discusses the capabilities of language models, particularly in relation to their ability to produce text that can be challenging to distinguish from human writing.

💡Random seed

A random seed is a starting point used by a random number generator to produce a sequence of numbers. In the context of watermarking, the random seed determines which words are blacklisted during the text generation process. By using a consistent seed, the watermark can be reconstructed and used to verify the authenticity of the generated text.


In the context of watermarking, a blacklist is a list of words that are intentionally avoided by the AI language model during the text generation process. This creates a unique pattern that can be detected statistically to identify AI-generated text. The blacklist is constructed based on the random seed and the decoding mechanism of the language model.


Ms. Coffee Bean teaches machine learning at her university.

The concern of AI-generated content in academic submissions post-ChatGPT.

Introducing two methods for detecting AI-generated text: GPTZero and watermarking.

Cohere's sponsorship of the video, showcasing their natural language processing capabilities.

Cohere's ease of use, requiring no machine learning skills for text classification or document generation.

GPTZero's method of detecting AI text based on perplexity and burstiness.

Perplexity as a measure of how surprising a text is to a language model.

Burstiness as a measure of sentence complexity and variation in human versus AI writing.

GPTZero's vulnerability to being fooled by intentional errors in AI-generated text.

Watermarking as a more reliable method for detecting AI-generated text with a unique fingerprint.

The process of watermarking, involving a random blacklist of words during the language model's decoding mechanism.

Watermarking's potential to be implemented in ChatGPT and other language models.

The possibility of fooling watermarking through word substitutions and other attacks.

The "emoji attack" as a powerful method to bypass watermarking by randomizing the blacklist.

Watermarking's reliance on the willingness of companies to apply it to their language models.

The debate on whether watermarking is necessary or an unnecessary complication.