Paulo Canelas

PhD Student at Carnegie Mellon University

PhD Student at University of Lisbon

I am a PhD student working under the supervision of Alcides Fonseca, Sara Silva, and Christopher S. Timperley. My research focuses on developing program analysis techniques to detect errors in software systems. Previously, I worked on evolutionary program synthesis using refinement types, and I am currently closely researching the application of software engineering techniques to the robotics field (Software Engineering for Robotics).

pasantos(at)andrew.cmu.edu Google Scholar GitHub Twitter LinkedIn

Education

Carnegie Mellon University

Sep. 2020 - Jul. 2026 (Expected)

Dual Degree PhD in Software Engineering with University of Lisbon

Thesis: Specification-Driven Detection of Misconfigurations in ROS-based Robotic Systems

Advisor: Alcides Fonseca, Sara Silva, and Chris Timperley
Faculty of Sciences of University of Lisbon

Sep. 2018 - Jul. 2020

MSc. in Computer Science

Thesis: Towards the Conceptualization of Refinement Typed Genetic Programming

Advisor: Alcides Fonseca
Faculty of Sciences of University of Lisbon

Sep. 2015 - Jul. 2018

BSc. in Computer Science

Work Experience

Uber Technologies Inc.

May 2024 - Aug. 2024

PhD Software Engineer Research Intern

Teaching Experience

Carnegie Mellon University

Teaching Assistant

17-643 - Quality Management

Mar. 2024 - May 2024

17-623 - Quality Assurance

Oct. 2023 - Dec. 2023
Faculty of Sciences of University of Lisbon

Invited Teaching Assistant

Programming

Sep. 2021 - Feb. 2022

Object Oriented Development

Jan. 2021 - Jun. 2021

Selected Publications

Are Large Language Models Memorizing Bug Benchmarks?

Daniel Ramos, Claudia Mamede*, Kush Jain*, Paulo Canelas*, Catarina Gamboa*, Claire Le Goues (* equal contribution)

Large Language Models for Code (LLM4Code) Workshop. 2025. Best Paper Award

Large Language Models (LLMs) have become integral to various software engineering tasks, including code generation, bug detection, and repair. To evaluate model performance in these domains, numerous bug benchmarks containing real-world bugs from software projects have been developed. However, a growing concern within the software engineering community is that these benchmarks may not reliably reflect true LLM performance due to the risk of data leakage. Despite this concern, limited research has been conducted to quantify the impact of potential leakage. In this paper, we systematically evaluate popular LLMs to assess their susceptibility to data leakage from widely used bug benchmarks. To identify potential leakage, we use multiple metrics, including a study of benchmark membership within commonly used training datasets, as well as analyses of negative log-likelihood and *n*-gram accuracy. Our findings show that certain models, in particular CodeGen, exhibit significant evidence of memorization in widely used benchmarks like Defects4J, while newer models trained on larger datasets like Llama 3.1 exhibit limited signs of leakage. These results highlight the need for careful benchmark selection and the adoption of robust metrics to adequately assess models capabilities.

Are Large Language Models Memorizing Bug Benchmarks?

Daniel Ramos, Claudia Mamede*, Kush Jain*, Paulo Canelas*, Catarina Gamboa*, Claire Le Goues (* equal contribution)

Large Language Models for Code (LLM4Code) Workshop. 2025. Best Paper Award

Understanding Misconfigurations in ROS: An Empirical Study and Current Approaches

Paulo Canelas, Bradley Schmerl, Alcides Fonseca, Christopher S. Timperley

International Symposium on Software Testing and Analysis (ISSTA). 2024.

The Robot Operating System (ROS) is a popular framework for building robot software from reusable components, but configuring and connecting these components correctly is challenging. Developers often face issues due to unstated assumptions, leading to misconfigurations that can result in unpredictable and dangerous behavior. To improve the reliability of ROS projects, it is critical to identify the broader set of misconfigurations. To that end, we perform a study on ROS Answers, a Q&A platform, to categorize these misconfigurations and evaluate how well existing detection techniques cover them. We identified 12 high-level categories and 50 sub-categories, with 27 not covered by current techniques.

Understanding Misconfigurations in ROS: An Empirical Study and Current Approaches

Paulo Canelas, Bradley Schmerl, Alcides Fonseca, Christopher S. Timperley

International Symposium on Software Testing and Analysis (ISSTA). 2024.

All publications

News

2024

Paper on ROS Misconfigurations accepted at the International Symposium on Software Testing and Analysis (ISSTA)!

Jul 03

Started my PhD Software Engineer Summer Internship at Uber Technologies Inc.

May 14

Paper on Physical Unit Mismatches accepted at the International Conference in Robotics and Automation (ICRA).

Jan 15

2023

2-minute Lightning Talk at ROSCon 2023 on Understanding, Detecting and Repairing Misconfigurations in ROS. ⚡ Watch

Oct 20

Paper on the Usability of Liquid Types in Java accepted at the International Conference in Software Engineering (ICSE).

Jan 12

2022

Paper on the Challenges in Learning ROS accepted at the International Workshop on Robotics Software Engineering (RoSE).

Feb 25

2020

Our project ecoServer achieved the Top 15 out of 1152 at the EDP University Challenge Competition. Read more

Jul 10

Best Poster award at the 5th LASIGE Workshop! 🏆 Read more

Feb 14

Action required

Education

Work Experience

Teaching Experience

Selected Publications

Are Large Language Models Memorizing Bug Benchmarks?

Are Large Language Models Memorizing Bug Benchmarks?

Understanding Misconfigurations in ROS: An Empirical Study and Current Approaches

Understanding Misconfigurations in ROS: An Empirical Study and Current Approaches

All publications

News