Disputation: Chao Tan

Doctoral candidate Chao Tan at the Department of Informatics, Faculty of Mathematics and Natural Sciences, is defending the thesis A Language Model-Based Approach to Generate Dynamic Synthetic Test Data for the degree of Philosophiae Doctor.

Bildet kan inneholde: briller, panne, briller, nese, kinn.

The PhD defence will be partially digital, in Simula Research Laboratory, Hans Petter Langtangen Lecture Hall (Kristian Augusts Gate 23) and ZOOM and streamed directly using Zoom. The host of the session will moderate the technicalities while the chair of the defence will moderate the disputation.

 

Ex auditorio questions: the chair of the defence will invite the attending audience at Simula Research Laboratory, Hans Petter Langtangen Lecture Hall to ask ex auditorio questions. 

Trial lecture

" Summarize and present key contributions by 2023 ACM Turing Award winner Avi Wigderson, and the impact his research have had on computer science.”

Time and place: June 10,  2024 11:15 AM,Simula Research Laboratory, Hans Petter Langtangen Lecture Hall (Kristian Augusts Gate 23) and Zoom

Main research findings

In recent years, regulatory frameworks such as the GDPR have restricted the use of real or anonymized production data for software testing. Consequently, a need for synthetic, yet production-like test data as surged. My thesis addresses this need by introducing a novel approach using deep learning techniques to generate such data. Our research identified test data requirements from complex industrial scenarios through a case study within the Norwegian public sector and proposed a model model-based solution based on the requirements. Framing this challenge as a language modelling problem, we experimented with multiple deep-learning techniques to build language models. Furthermore, we proposed an evaluation framework to quantitatively assess the generated data quality. Evaluation results demonstrated that the trained language models can produce high-quality synthetic data, proving the effectiveness of the solution. To improve the industrial applicability of, we designed a more expressive domain-specific language that enhances our leverage of language modelling techniques. This enabled us to scale up the complexity of the solution to industrial level. The thesis showcases that our approach generates high-quality, production-like test data suitable for complex system testing. The practical evaluations underscore its effectiveness and applicability, advancing the use of deep learning for reliable software testing.

 

Adjudication committee

  • Professor Richard Torkar, Chalmers and Univ. of Gothenburg, Sweden
  • Senior Research Scientist Dusica Marijan, Simula Research Laboratory, Norway
  • Associate Professor Viktoria Stray, University of Oslo, Department of informatics Norway

Supervisors

  • CEO Erik Arisholm, Testify AS, Norway 

  • Professor Magne Jørgensen, Simula Research Laboratory, Norway 

  • Dr. Razieh Behjati, Simula Research Laboratory, Norway

Chair of defence:

Professor Ole Hanseth

Contact information at Department: Mozhdeh Sheibani Harat 

Publisert 27. mai 2024 14:16 - Sist endret 7. juni 2024 12:33