SURF Mentoring
Potential projects/topics: Turning Scientific Figures into Data for Intelligent Concrete Materials Design
Have you ever wondered how the data behind scientific graphs and plots are created—and how they could be reused to design better materials? This summer research project introduces students to an emerging area at the intersection of materials science, data science, and artificial intelligence. Students will learn how to extract valuable data from figures published in scientific papers and turn it into structured datasets that can be used for machine learning–guided materials design.
The goal of this project is to train students to identify, extract, and organize data embedded in scientific figures and reuse it for large-scale data analysis and AI modeling to enable better materials design.
Concrete is the most widely used manufactured material shaping modern society, yet its material design remains largely empirical and often suboptimal, leading to excessive CO2 emissions (~9% global anthropogenic emissions) and unnecessary material costs. While data-driven approaches are becoming increasingly popular, their use in concrete materials design is limited by the lack of large, well-organized datasets. At the same time, a vast amount of valuable information already exists in the figures and plots of scientific papers but is rarely extracted or reused. Leveraging this underutilized data is an important step toward data-driven and intelligent concrete materials design.
With close mentorship and step-by-step guidance, the student will:
- Learn how to find, read, and understand scientific papers, with a focus on interpreting figures and graphs
- Gain a foundational understanding of concrete materials, including composition, processing, and performance
- Practice extracting numerical data from published figures and organizing it into clean, structured datasets
- Use beginner-friendly programming and visualization tools (e.g., Python) to explore basic trends and patterns in the data
- Be introduced to how modern AI tools, including large language models and simple machine learning methods, are used in materials research
- Assist in building components of a larger data-extraction and analysis workflow The project is structured with clear milestones and frequent mentoring, allowing students with no prior research experience to build confidence, technical skills, and independence over the summer. Expected Outcomes
- Contribution to an ongoing research effort developing a large concrete materials database
- Opportunity for co-authorship on journal papers or conference presentations
- Strong preparation for graduate school or careers in engineering, data science, or AI
- Potential continuation as a research assistant beyond the summer
Potential skills gained: By participating in this project, students will gain:
- Experience reading and understanding scientific papers, with an emphasis on interpreting figures and graphs
- Foundational knowledge of concrete materials and how composition and processing influence performance
- Hands-on practice extracting numerical data from plots and organizing it into structured datasets
- Introductory programming skills for data analysis and visualization (e.g., Python)
- Experience working with real research data, including data cleaning and basic exploratory analysis
- An introduction to how data-driven methods and artificial intelligence are used in materials and engineering research
- Understanding of the research process, including problem formulation, data collection, and iterative analysis
- Improved critical thinking, attention to detail, and scientific communication skills
- Confidence working in a research environment and collaborating with mentors and peers.
Required qualifications:
- Required skills: No prior research experience is required. An interest in learning programming and working with data is expected. Curiosity, motivation, and a willingness to learn are more important than prior technical background.
Direct mentor: Faculty/P.I., Post-doctorate
