Skip to Main Content
An official website of the United States government

Database Composition and Knowledge Generation for Image-Based Diagnostic Support Systems Medical, Considering Big Data

Principal Investigator

Name
Agma Juci Machado Traina

Degrees
M.Sc., Ph.D.

Institution
Institute of Mathematical and Computer Sciences, University of Sao Paulo

Position Title
Full Professor

Email
agma@icmc.usp.br

About this CDAS Project

Study
NLST (Learn more about this study)

Project ID
NLST-1483

Initial CDAS Request Approval
Dec 22, 2025

Title
Database Composition and Knowledge Generation for Image-Based Diagnostic Support Systems Medical, Considering Big Data

Summary
The increasing volume, variety, and complexity of data generated in modern human activities, including healthcare, create new challenges and opportunities for computational methods. In medical environments, Electronic Health Records (EHRs) are essential sources of structured and unstructured data, including demographic information, clinical reports, and medical images. However, these datasets are often fragmented across institutions, stored in heterogeneous formats, and collected for operational rather than research purposes.
This project aims to develop an integrated data platform that unifies public and institutional datasets from healthcare partners to support research in content-based image retrieval (CBIR) systems for medical decision support. The platform will be designed to organize, curate, and normalize heterogeneous data, enabling interoperability and effective use of large-scale medical data in research contexts.
By leveraging data integration and standardization techniques within the Big Data paradigm, the project seeks to enhance the development and validation of algorithms for medical image retrieval and diagnostic support. The resulting curated database will provide a robust foundation for experiments conducted by the Database and Image Group (GBdI) at the University of São Paulo, facilitating research reproducibility and promoting open scientific collaboration in medical informatics.

Aims

- Develop an integrative data platform capable of unifying and managing Electronic Health Record (EHR) data, including medical images, from multiple public and institutional sources.
- Design and implement data curation and normalization processes to ensure completeness, consistency, and interoperability of medical datasets for research purposes.
- Enable content-based image retrieval (CBIR) research, providing standardized and annotated datasets to support the development and validation of image similarity algorithms for diagnostic assistance.
- Facilitate reproducible research and data sharing, establishing a structured and well-documented database that can be accessed by the scientific community under proper data governance and privacy standards.
- Contribute to the advancement of decision support systems, fostering the integration of clinical and imaging data to improve the accuracy and interpretability of computer-aided diagnosis tools.

Collaborators

Agma Juci Machado Traina Institute of Mathematical and Computer Sciences, University of Sao Paulo
Lucas Piovani Ferreira Institute of Mathematical and Computer Sciences, University of Sao Paulo
Willian Dener de Oliveira Institute of Mathematical and Computer Sciences, University of Sao Paulo