AI Seminar Series: Devika Subramanian, Big Data & ML - Challenges and Opportunities (June 25)

28 Jun 202166:14

TLDRThe transcript highlights a talk on big data and machine learning challenges and opportunities, focusing on the speaker's journey from engineering to AI and machine learning. It discusses the importance of data selection, model selection, model validation, and algorithms in machine learning. The speaker shares experiences from various projects, including medical diagnostics, social media analytics, and engineering, emphasizing the need for collaboration and the creative application of machine learning to solve real-world problems. The talk also touches on the detection of Twitter bots and their impact on public opinion, particularly in the context of gun control in the United States.


  • 🎤 The speaker has a long history of collaboration and has given many invited presentations at prestigious institutions and conferences.
  • 🏆 The speaker has received numerous teaching awards from institutions such as Stanford, Cornell, and Rice.
  • 🤖 The focus of the talk is on the challenges and opportunities in big data and machine learning, with a particular interest in how machines can design their own representations.
  • 🧠 The speaker's approach to problem-solving involves finding interesting problems that others care about and then diving in headfirst.
  • 🔍 The speaker emphasizes the importance of data selection, model selection, model validation, and algorithm selection in machine learning projects.
  • 🏥 There is a significant amount of work in clinical data analytics, with projects related to predicting esophageal cancer and complications in type 1 diabetic patients.
  • 🧬 The speaker's work also extends to social sciences and social media analytics, including understanding attitudes towards gun control in America from Twitter data.
  • 🌪️ Projects in engineering and natural sciences involve weather forecasting and predicting wind and flood damage risk.
  • 🤖 The speaker believes that application is not a secondary activity but a creative task that requires good modeling and handling of data.
  • 📈 The speaker has developed algorithms for real-world socially relevant problems and has published in application community journals.
  • 🔗 The speaker's work involves collaboration with domain scientists and has led to the development of new algorithms and approaches.

Q & A

  • What is the main topic of the talk given by the speaker?

    -The main topic of the talk is 'Big Data and Machine Learning Challenges and Opportunities', discussing the role of AI and machine learning in making objective decisions based on data and prior knowledge.

  • How does the speaker's background in engineering influence her approach to machine learning problems?

    -The speaker's background in engineering influenced her approach by focusing on solving real-world problems and applying machine learning to socially relevant issues, emphasizing the importance of understanding the problem domain and integrating knowledge from domain experts.

  • What are the four fundamental questions that the speaker addresses in the context of machine learning?

    -The four fundamental questions are: what data should be gathered for predictions, what class of models to build, what is the form of the model that enables predictions, and what algorithms should be used to learn these models from data.

  • How does the speaker's work on predicting esophageal cancer from MBI images contribute to early detection?

    -The speaker's work aims to develop a predictive model that can identify cancerous lesions in images obtained from a new imaging technology called narrowband imaging with magnification, potentially aiding in early detection and treatment of esophageal cancer.

  • What is the significance of the speaker's research on understanding attitudes to gun control in America from Twitter data?

    -The significance of this research is to analyze public opinion on gun control by monitoring Twitter, accounting for the presence of bots that may skew the perception of public sentiment, and to inform policies and public understanding of the gun control debate.

  • How does the speaker address the challenge of limited data in clinical medicine?

    -The speaker addresses this challenge by using ensemble methods to build models from limited data, emphasizing the importance of gathering information over time rather than in one instance, and integrating multiple sources of information to make robust predictions.

  • What is the role of the speaker's research in the development of predictive models for clinical decision-making?

    -The speaker's research plays a crucial role in developing predictive models that can assist clinicians in making more informed decisions, such as predicting complications in type 1 diabetic patients and providing personalized risk assessments for diseases like esophageal cancer.

  • How does the speaker's approach to machine learning projects emphasize the importance of application?

    -The speaker emphasizes that application is not a secondary activity but a creative task that requires good modeling and problem-solving skills. She seeks out problems, works with domain scientists, and applies machine learning methods to develop solutions that are both theoretically sound and practically useful.

  • What are some of the challenges the speaker faced while working on the submarine game project with the navy?

    -Some challenges included understanding why the game was difficult, identifying the strategies that lead to success or failure, and developing models that could accurately predict human learning and performance on the task without relying on self-reported strategies from subjects.

  • How does the speaker's work on detecting Twitter bots contribute to the understanding of public opinion on social issues?

    -The speaker's work on detecting Twitter bots helps to identify and filter out non-organic traffic that can skew the perception of public opinion on social issues. By understanding the presence and influence of bots, more accurate insights can be gained about the true sentiment of the public on topics like gun control.



🎤 Introduction and Collaboration

The speaker begins by announcing a talk by their colleague, Davis Romanian, whom they've known since their graduate student days in the 1980s. They highlight their collaborative work and Romanian's numerous invited presentations and workshops. Romanian's contributions to AI and machine learning, along with her role as a program co-chair and her teaching awards at various institutions, are emphasized. The speaker expresses excitement for the upcoming talk and potential future collaborations.


📊 Big Data and Machine Learning: Challenges and Opportunities

Davis Romanian introduces her talk on the challenges and opportunities in big data and machine learning. She shares her journey from being an engineer to working on AI and machine learning, emphasizing her thesis on machine-designed representations. Romanian discusses her academic career, her shift towards problem-solving, and her approach to selecting problems that are interesting and relevant to others. She outlines the framework of machine learning, involving complex systems, training data, learning algorithms, and predictive models to control these systems and achieve objectives.


🧠 Machine Learning: Theoretical and Real-World Applications

The speaker delves into the fundamental questions that arise in machine learning, such as data selection, model selection, model validation, and learning algorithms. They discuss the importance of prior knowledge and training data in building predictive models and the role of machine learning in making objective decisions. The speaker shares their experience in applying machine learning to real-world problems, emphasizing the creativity involved in applying methods to problems and the importance of validation.


🏥 Clinical Data Analytics and Predictive Models

The speaker shares their projects in medicine and clinical data analytics, highlighting their work with medical centers and research institutions. They discuss predicting esophageal cancer from MRI images and understanding attitudes towards gun control in America using Twitter data. The speaker emphasizes the importance of integrating multiple sources of information and the challenges of working with limited data in clinical settings. They also mention their current projects in weather forecasting and predicting human behavior from neuroscience data.


🌪️ Predicting Esophageal Cancer: A Machine Learning Approach

The speaker presents a specific case study of detecting esophageal cancer at an early stage using machine learning. They discuss the challenges of interpreting images due to the lack of quantitative vocabulary for surface patterns and vasculature. The speaker describes their approach of using computer vision techniques to extract features from images and build a predictive model. They share the results of their model and the potential for integrating this model into clinical tools to assist doctors in diagnosing cancerous lesions.


🎮 Navy Submarine Commander Training Game Analysis

The speaker recounts their involvement in analyzing a game used by the Navy to train submarine commanders. They discuss the stressful nature of the game, which involves navigating a submarine through mine-infested waters. The speaker explains their research into understanding why some individuals fail or succeed in the game, using machine learning to analyze visual motor data and infer strategies. They highlight the importance of detecting inability to learn early enough to intervene and the potential of their research to shape human learning on the task.


🤖 Twitter Bot Detection and Analysis

The speaker discusses their research on detecting Twitter bots and analyzing their impact on political discussions, particularly regarding gun control. They explain their unsupervised approach to identifying bot accounts by analyzing groups of accounts tweeting similar texts and shortened URLs. The speaker shares their findings on the prevalence of bots and their role in amplifying polarized discussions. They also discuss their collaboration with Twitter, which led to changes in Twitter's policy to counter bot activity.


🌪️ Predicting Hurricane Risks for Houston

The speaker talks about their work on predicting hurricane risks for Houston, aiming to provide individualized risk assessments for residents. They discuss the integration of multiple data sources, including wind, flood, and power loss risks, to inform decision-making about evacuations. The speaker emphasizes the importance of this work due to the frequency of hurricanes in Houston and the potential for their models to assist in disaster preparedness.



💡Machine Learning

Machine learning is a core concept in the video, referring to the scientific study of algorithms and statistical models that allow computers to learn from and make decisions or predictions based on data. It is central to the speaker's work in interpreting complex systems and making objective decisions. The video discusses various applications of machine learning, such as predicting disease outcomes and analyzing social media data.

💡Big Data

Big Data refers to the large volume of data that is generated and collected in various fields, such as social media, healthcare, and finance. In the context of the video, big data is crucial for training machine learning models to make accurate predictions and decisions. The speaker emphasizes the importance of selecting the right data to train these models effectively.

💡Predictive Models

Predictive models are mathematical or computational representations used to forecast future outcomes based on historical data. In the video, the speaker focuses on developing predictive models using machine learning to understand complex systems like disease progression and social attitudes. These models help in making informed decisions and predictions in domains such as healthcare and social media analysis.

💡Data Selection

Data selection, also known as feature selection, is the process of choosing the most relevant data points or features from a larger dataset for use in machine learning models. It is essential for improving model performance and reducing complexity. The speaker highlights the challenge of data selection and its impact on the effectiveness of predictive models.

💡Model Validation

Model validation is the process of checking the accuracy and reliability of a predictive model by comparing its predictions to actual outcomes. It is critical for ensuring that the model is generalizable and can perform well on new, unseen data. The speaker emphasizes the importance of model validation in the machine learning process and the challenges associated with it.

💡Learning Algorithms

Learning algorithms are the computational processes used to train machine learning models by adjusting their parameters to minimize prediction error. They are at the heart of the machine learning process, transforming raw data into actionable insights. The speaker's work involves designing and validating algorithms for real-world problems.

💡Clinical Data Analytics

Clinical data analytics involves the analysis and interpretation of patient data to improve healthcare outcomes. The speaker's work in this area includes predicting diseases like esophageal cancer and diabetic ketoacidosis using machine learning techniques. This field is crucial for personalized medicine and evidence-based healthcare.

💡Social Media Analytics

Social media analytics is the process of examining social media data to extract insights about public opinion, trends, and behaviors. In the video, the speaker uses social media analytics to understand attitudes towards gun control in America by analyzing Twitter data. This field is important for market research, public policy, and sentiment analysis.

💡Twitter Bots

Twitter bots are automated accounts on Twitter that post, tweet, or retweet content automatically. These bots can influence public opinion and skew analyses of social media data. The speaker's research includes developing methods to detect and understand the impact of Twitter bots on political discussions.

💡Change Point Detection

Change point detection is a statistical method used to identify significant shifts or changes in a data series. In the context of the video, it is crucial for understanding when a learning strategy or policy changes, which is part of the speaker's work on analyzing human learning behavior through machine learning.


The speaker, a long-time collaborator and friend since the 1980s, is introduced with a rich background in AI and machine learning.

The speaker has an impressive record of invited presentations, teaching awards, and advisory roles in prestigious institutions and conferences.

The talk focuses on the challenges and opportunities in big data and machine learning, reflecting the speaker's struggle to find an interesting topic.

The speaker's early work on AI and machine learning revolved around the idea of machines designing their own representations.

A key principle in the speaker's work is selecting problems that are interesting and relevant to others, beyond just personal interest.

Machine learning and AI are positioned as tools to enable objective decisions based on data, rather than intuition.

The speaker emphasizes the importance of data selection, model selection, model validation, and algorithm selection in machine learning projects.

The speaker's work spans various fields including medicine, social sciences, and engineering, with a focus on real-world applications.

A notable project involves predicting esophageal cancer using narrowband imaging with magnification, aiming to improve early detection rates.

The speaker discusses the challenges of working with clinical data, particularly the scarcity of data and the need for medical knowledge in model building.

The speaker's research on Twitter data aims to understand public opinion on gun control in the United States.

An innovative method developed by the speaker's team successfully detects and differentiates between Twitter bots and human users.

The speaker's work on chaotic dynamical systems and weather forecasting involves predicting wind and flood damage risks.

A key takeaway from the talk is that application in machine learning is a creative task requiring good modeling, not just an afterthought.

The speaker's approach to machine learning involves working closely with domain scientists to understand problems and develop new algorithms.

The speaker's method for detecting Twitter bots has been recognized and utilized by Twitter to improve their platform policies.

The speaker's analysis of Twitter data post-Parkland shooting reveals the significant role of bots in amplifying certain political narratives.