FAIROmics – PhD fellowship in Artificial Intelligence-based tools for information retrieval from text documents – Automatic information extraction from scientific publications and other sources to build knowledge graph

Italie
Posted 9 Monaten ago
Organisation/Company
Alma Mater Studiorum – University of Bologna
Department
Department of Physics and Astronomy
Research Field
Computer science
Physics
Mathematics » Statistics
Engineering
Researcher Profile
First Stage Researcher (R1)
Country
Italy
Application Deadline
Type of Contract
Temporary
Job Status
Full-time
Offer Starting Date
Is the job funded through the EU Research Framework Programme?
HE / MSCA
Reference Number
DC10
Marie Curie Grant Agreement Number
101120449
Is the Job related to staff position within a Research Infrastructure?
No

Offer Description

„FAIRification of multiOmics data to link databases and create knowledge graphs for fermented foods“ MSCA-DN-JD Doctoral Network.

The FAIROmics initiative, an interdisciplinary research programme, will gather universities, research centres and private companies to enable the FAIRification of omics data and databases interoperability and develop knowledge graphs for data-driven decision-making to rationally design microbial communities for imparting desirable characteristics to plant-based fermented foods in the context of open science and its regulations. The FAIROmics training programme aims to develop doctoral candidates’ skills at the interface between artificial intelligence, life sciences, humanities, and social sciences.

Plant-based dairy and meat alternatives have grown in popularity in recent years for various reasons, including sustainability and health benefits, as well as lifestyle trends and dietary restrictions. However, plant-based food products can be nutritionally unbalanced, and their flavour profiles may limit their acceptance by consumers. Microorganisms have been used in making food products for millennia. However, the diversity of microbial communities driving plant-based fermentations, as well as their key genetic and phenotypic traits and potential synergies among community members, remain poorly characterised. Many data exist, but they are spread into different literature (scientific and grey) or, in the best case, in different databases. However, they are not always reusable because they are difficult to find and access and because databases are not systematically interoperable.

Please note that this PhD position will lead to the award of a double diploma after the completion of a stay in each of these organisations: the University of Bologna (UNIBO), Italy and the University of Szeged (USZ), Hungary.

Objectives:

We are looking for one Doctoral Candidate (DC) to join our project at multiple sites in the EU with a master’s degree in a relevant discipline (Physics, Computer science, Mathematics, Statistics, Engineering or related fields) interested in learning and developing AI-assisted tools for information extraction from domain-specific scientific texts (journals, books, etc.) related to novel food engineering, microbial catalogues, fermentation, and other relevant processes. The information acquired and related semantic analysis will be utilised to build knowledge graphs and more general ontologies.

The candidate will learn AI algorithms mainly related to Natural Language Processing and information embedding (eg word2vec, Transformer Networks, G-retriever) and connection of topics and keywords (e.g. through Network Diffusion processes), and within the context of FAIRomics will apply for specific case studies:

  • Development of AI-assisted tools to enhance the yield of the research team in literature search and processing.
  • Automatic and semi-automatic information extraction from domain-specific scientific texts (journals, books, etc.) related to novel food engineering, fermentation, and other relevant processes.
  • Demonstrate the efficacy of the developed system, focusing on a carefully selected pool of previously collected articles and books. The information acquired and semantic analysis will be utilised to build knowledge graphs and more general ontologies, which will be used to enhance the method’s performance and expand the knowledge base of domain-specific databases such as microbe collection catalogues.

Expected results:

Expected is a prototype AI tool that can process a pool of documents by automatically analysing their text and retrieving topic-specific information in terms of key concepts and named elements and finding the connection among them. This supports the building of a knowledge graph and/or, more generally, an ontology according to the domain defined by the documents. The extracted information can be used, on the one hand, to find related documents or identify parts of documents for a set of user-specified concepts. On the other hand, extracted and processed information in terms of a knowledge graph can be used to incorporate new data into existing knowledge bases and data catalogues. Possibly, the developed tools will also be applied in other related contexts, e.g. social networks.

Location and planned secondments:

The first secondment will occur at the Department of Software Engineering, University of Szeged HU (Prof. Laszlo Vidacs), Month 14 (18 months), to develop automatic and semi-automatic information extraction and processing to build knowledge graphs and ontologies.

The second secondment will occur at the Department of Computer Science, University Paderborn DE (Prof. Axel-Cyrille Ngonga Ngomo), Month 28 (2 months), to learn about state-of-the-art knowledge-driven data science methods.

Enrolment in Doctoral degree:

1st-degree awarding organisation: Alma Mater Studiorum – University of Bologna, Bologna IT, https://www.unibo.it/en/homepage
2nd-degree awarding organisation: University of Szeged, Szeged HU, https://u-szeged.hu/english

Supervisors team

The lead supervisor is D. Remondini, full professor at the Department of Physics and Astronomy at Alma Mater Studorium – Bologna University. He works in the application of mathematical models in Biology, such as Network Theory for the study of Complex Systems, and the development of innovative algorithms for the analysis of high dimensional biological, biomedical and virological data (multiple omics, NGSeq, Neuroimaging, text data) with Machine Learning and AI techniques. He actually leads a group with 4 PhD students (1 ITN PhD student) 3 PostDoc Students, and 3 Research Assistants, with >50 Undergraduate and Master Thesis students in Physics. The Co-supervisor is E. Giampieri (Assistant Professor), with expertise in scientific computing, data management analysis and modelling, and supervisor of >20 Undergraduate and Master Thesis students in Physics.

The Hungarian team is composed of Vidács László and Balázs Nagy, professors in artificial intelligence, natural language processing, and software engineering.

Host institutions description:

The project will occur at the Laboratory of Applied Physics and Systems Biophysics, Department of Physics and Astronomy (DIFA) of the Alma Mater Studiorum – University of Bologna, Italy. DIFA is one Department of the Science School and one of the most scientifically productive Physics Departments in Italy. DIFA has a large computing facility, available to the Biophysics group (14-core HPC, 2 GPU server with >1TB RAM and 2 nVidia A100, mirrored storage server with >100 Tb storage), and to the whole Department (OPH HPC facility, >200 cores). Prof. Daniel Remondini is the director of the lab, with specific expertise in biomedical data analysis (Machine Learning, Deep Learning), complex network theory and its applications to BioMedicine. Dr Enrico Giampieri has specific expertise in scientific computing, including networks, stochastic processes and statistics in High-Performance Computing environments. All the lab members are involved in several national and EU projects (Precision Medicine, Epidemiology, Public Health, Food Production).

The University of Szeged(USZ) is recognised as a top research institution in Hungary, boasting a diverse student body of over 21,000, including more than 4,000 international students from 115 countries. Led by László Vidács, the Applied Artificial Intelligence Research Group is dedicated to advancing cutting-edge AI research. We specialise in diverse AI applications, from natural language understanding to image processing. Our tailored machine learning and deep learning solutions address real-world challenges in many domains, including medical imaging diagnostics, forensic text analysis, and program source code processing.

Requirements

Research Field
Computer science
Education Level
Master Degree or equivalent
Research Field
Physics
Education Level
Master Degree or equivalent
Research Field
Engineering
Education Level
Master Degree or equivalent
Research Field
Mathematics » Statistics
Education Level
Master Degree or equivalent
Skills/Qualifications
  • Master’s degree in Physics, Computer science, Mathematics, Statistics, Engineering or related fields, giving access to PhD school and NOT to have any kind of PhD degree. Although appreciated, previous research experience (which must be no longer than four years) is not mandatory.
  • Networking and good communication skills (writing and presentation skills).
  • Willingness to travel abroad for the purpose of research, training and dissemination.
  • Good skills in programming high-level languages like Python, R, and Matlab (not mandatory but highly recommended for network tools development and usage).
Specific Requirements
  • Any nationality
  • Doctoral Candidate (DC): The applicant must not have been awarded a doctoral degree.
  • Mobility rule: The DC must not have resided or carried out main activity (work, studies, etc.) in the country of their host organisation for more than 12 months* in the three years immediately prior to the date of selection in the same appointing international organisation.

* EXCLUDED: short stays such as holidays, compulsory national services such as mandatory military service and procedures for obtaining refugee status under the General Convention.

  • Language: Applicants must demonstrate fluent reading, writing and speaking abilities in English (B2).
Languages
ENGLISH
Level
Good
Research Field
EngineeringComputer sciencePhysicsMathematics » Statistics
Years of Research Experience
1 – 4

Additional Information

Benefits

We offer

  • A comprehensive, interactive and international training programme covering the broader aspects and interface between life. science, data science, artificial intelligence and humanities and social sciences, as well as transferable skills.
  • An enthusiastic team of professionals to co-operate with.
  • Personal Career Development Plan (PDCP) to prepare young researchers for their future careers.
    Each DC will undergo individual training at individual institutes according to the PCDP description.
  • An attractive compensation package in accordance with the MSCA-DN programme regulations for doctoral candidates. The exact salary will be confirmed and will be based on a living allowance of 3400€/month* (correction factor to be applied per country) + mobility allowance of 600€/month. Additionally, researchers may also qualify for a family allowance** of 660€/month, depending on the family situation. Taxation and social (including pension) contribution deductions based on national and company regulations will apply.

*monthly gross salary.

**family = be married/be in a relationship with equivalent status to a marriage recognised by the legislation of the country or region where it was formalised/have dependent children who are being maintained by the researcher.

Eligibility criteria
  • Any nationality
  • Doctoral Candidate (DC): The applicant must not have been awarded a doctoral degree.
  • Mobility rule: The DC must not have resided or carried out main activity (work, studies, etc.) in the country of their host organisation for more than 12 months* in the three years immediately prior to the date of selection in the same appointing international organisation.

* EXCLUDED: short stays such as holidays, compulsory national services such as mandatory military service and procedures for obtaining refugee status under the General Convention.

  • Language: Applicants must demonstrate fluent reading, writing and speaking abilities in English (B2).
Selection process

The selection process is based on the merits of providing equal opportunity and will be in agreement with the European Code of Conduct for the Recruitment of Researchers.

  1. Candidates apply for a position using the online application form found on the FAIROmics website.
  2. The FAIROmics Project Manager provides a first screen of the written applications to check the eligibility of the candidate and forwards the eligible applications to the DC supervisors.
  3. The DC supervisors will select the best candidates based on CV, academic records, recommendation and motivation letters and adequate skill set. To better assess the best candidate, the shortlisted candidates might be asked to write an abstract of provided scientific documents relevant to the research subject.
  4. The selected applicants will be interviewed through an online meeting by the Selection Committee (two main supervisors and two representatives of a beneficiary or associated partner, with at least one person external to the DC’s project).
  5. The best candidates will be chosen by the main supervisors. The European Project Manager will communicate the successful candidates to the Consortium and Partners.
Website for additional job details

Job Features

Job CategoryDoctorat

Apply For This Job

Check Also

Une avancée russe et des doutes américains : débat sur un nouveau vaccin contre le cancer

La Russie a récemment annoncé le développement d’un vaccin révolutionnaire contre le cancer, qui sera …