Data-driven healthcare


Advanced data science to enhance decision-making in healthcare, based on complex, heterogeneous and scarce data.

Healthcare challenges

Healthcare is facing several challenges:

  • Several medical indications are still nowadays subject to wrong and/or inefficient diagnoses and therapeutic strategy choices
  • The industry is subject to difficulties to come with new efficient therapeutics.
  • Medical technologies have huge societal costs.
  • An ever increasing pressure is put on budgets from health payers.
  • Patients are more and more empowered and request for more and more tailoring in the management of their health.
  • In all fields of health care, from fundamental research to market surveys, we observe a data deluge.

In that context, DNAlytics offers a tailored and on-demand consultancy service in data science. By making use of Artificial Intelligence, Machine Learning, Data Mining, Cloud computing techniques, we build new decision-support tools from public and private datasets. As an introduction to what we do, we suggest watching the short video below.

The typical business use cases are the following:

Our data science platform fits with many data production technologies as well as various medical areas and needs. Although perhaps not explicit from the start, most of the project we are confronted with pursue three objectives:

  1. Build and validate predictive solutions
  2. Identify a small set of relevant (bio)markers from a vast set of possible markers
  3. Prototype software applications making those solutions available to practitioners

Through most of our projects, we use software elements that combine publicly available code and our own code library. A part of our website is dedicated to relevant software elements developed and/or maintained at DNAlytics. We also invite you to read about some actual projects we have performed in the past.


Our core expertise is in data science. We build predictive models and identify what (combination of) markers (in a broad sense) should contribute to these models. These models enable making predictions in clinical research, biomanufacturing or public health.

In most cases, when large companies such as Google, Amazon, Facebook, Apple, … discuss the concept of big data, they refer to their own situation where many observations (millions of users) are available, each of them however described by a limited set of features. In this context, statistical learning, i.e. Machine Learning, is quite an easy task.

In the Healthcare context, there is usually a huge number (potentially billions) of parameters, while at the same time, there is generally a very limited number of observations (patients, batches). That is why at DNAlytics we rather talk about Fat Data. This is a very hard context to « learn » something with data science methods.

That is why we developed our unique technology: DNAlytics develops and uses special algorithms to cope with this specific healthcare context. We make heavy use of R programming language, a reference in data science. Upon this open-source layer, we build our own pieces of software, some in open-source, some in closed-source. Based on our own software libraries, we then build our customer applications. Once data science results are obtained (new predictive models, biomarkers, decision rules, risk scoring, …) their value increases if they can be made available to and actionable by healthcare professionals, i.e. non-data scientists. That is why we also provide software development capabilities, in order to implement these results.

This technology is effective and recognized, as demonstrated by almost 40 publications by DNAlytics collaborators. Some of our open-source libraries are also downloaded more than 2000 per month.

To obtain the computing power we need, we make heavy use of cloud computing solutions, such as the Amazon Web Services (AWS). We are an AWS certified Partner.

We are able to deal with very large datasets of many different kinds: epigenetics (e.g. methylation), genetics (DNA), transcriptomics (mRNA, lncRNA, miRNA, …), proteomics (e.g. mass spec.), elisa, metabolomics, clinical, epidemiologic, psychological, demographic data. On top of that, we also take advantage of publicly available data, and combine heterogeneous data sources to complement the data provided by our customers.

We master more classical statistics too (hypothesis testing, statistical analysis plan design/writing/execution). In the context of the conduct of clinical studies, we can also deploy electronic data capture tools (EDC), such as OpenClinica.



See a non-exhaustive list of publications (scientific communications, patents, software libraries) to which DNAlytics collaborators have contributed.


For more information about our software libraries, go to the dedicated page.