PhD position: NLP for Text to SQL Transformation
Job offer posted on 6 July 2022
DesCartes Program (Work Package 5) is looking for a PhD in NLP for Text to SQL Transformation, as part of the DesCartes program, which aims to develop disruptive hybrid AI to serve the smart city and to enable optimized decision-making in complex situations, encountered for critical urban systems.
DESCARTES PROGRAM
The DesCartes programme is developing a hybrid AI, combining Learning, Knowledge and Reasoning, which has good properties (need for less resources and data, security, robustness, fairness, respect for privacy, ethics), and demonstrated on industrial applications of the smart city (digital energy, monitoring of structures, air traffic control).
The program brings together 80 permanent researchers (half from France, half from Singapore), with the support of large industrial groups (Thales SG, EDF SG, ESI group, CETIM Matcor, ARIA etc.).
The research will take place mainly in Singapore, at the premises of CNRS@CREATE, with a competitive salary and generous funding for missions.
Read more about the DesCartes program here.
DESCRIPTION
Large amounts of information are stored in structured and semi-structured knowledge bases (e.g. Database). In order to effectively retrieve and analyze the data, it is necessary to interact with the database through programming languages such as SQL. The difficulty of proficient use of SQL hinders non-technical users and the data utilization rate. People urgently need technology and tools to break down the barriers between user and structured data, that’s what the TEXT-to-SQL(T2S) task aims to solve. Thus, it will have great potentials in applications. Since 2017, many Business Intelligence tools have provided functions that can support natural language queries, but they are focusing on simple queries. In this study, we are aiming at the solution for complex queries as well as quick adaptation to new applications.
The project will make use of Natural language processing (NLP) models to analyse and extract users’ intentions, making detection of user future
actions and the AI conversation systems more intelligence and practical. We target intentions for urban crisis management and study in particular: (1) Few-shot learning and data augmentations, (2) Weakly-supervised learning and (3) Multi-turn dialogue TEXT-2-SQL.
EXPERIENCE & QUALIFICATIONS
Preferred :
– A Master degree in Computer science with solid background in NLP and/or machine learning
– A good experience in deep learning approaches for NLP
– Good programming skills in Python
References:
– Tao Y, Rui Zhang, Kai Y, Michihiro Y, 2018b. Spider: A largescale human-labeled dataset for complex and crossdomain semantic parsing and text-to-SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921
– DongHyun Choi, Myeong Cheol Shin, EungGyun Kim, and Dong Ryeol Shin. 2020. RYANSQL: Recursively applying sketch-based slot fillings for complex text-to-SQL in cross-domain databases. arXiv preprint arXiv:2004.03125.
– Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, and Dongmei Zhang. 2019. Towards complex text-to-SQL in crossdomain database with intermediate representation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4524–4535.
– Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson. 2020b. RATSQL: Relation-aware schema encoding and linking for text-to-SQL parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7567–7578, Online. Association for Computational Linguistics
Supervision:
The thesis will happen within the France-Singapore collaboration, with advisors from both sides. The student will be registered at the University of Toulouse, and part of the IRIT lab, but is expected to spend a good part of the thesis in Singapore at the A*star Institute for Infocomm Research (I2R), with funding provided by the Descartes program.
The thesis will be supervised on the French side by Farah Benamara (IRIT) and co-advised by Bin Chen and Jian Su from the A*star I2R. The French advisor will also spend time at A*star during the thesis.
Duration of the position: 36 months
FURTHER INFORMATION & CONTACT
Workplace Address: CREATE Tower (NUS Campus), 1 Create Way #08-01 Singapore 138602
Please send a short cover letter describing your suitability for the position, detailed CV with academic ranking (if any) and publication list, a concise description of research interests and future plans, and academic transcripts to:
– Farah Benamara
farah.benamara@irit.fr
– Bin Chen
We will begin reviewing applications for the positions immediately.