The Data Warehouse in Healthcare


Health informatics is currently one of the most critical focus areas for researchers since accurate and timely data analysis facilitates organizational decision making. Still, most healthcare organizations cannot effectively manage huge volumes of data, which means that medical practitioners cannot leverage data analysis for quality improvement. According to current research, data warehousing can potentially solve the problems associated with huge volumes of data. Most data warehousing techniques are based on a standard architectural design, making them flexible enough to be implemented in multiple industries. Data warehousing can be a time-consuming and complicated healthcare process, but it is crucial for quality healthcare delivery. The purpose of this paper is to discuss the potential and challenges associated with data warehousing in healthcare. This paper also proposes a data-warehousing model suitable for the healthcare industry.


Multiple organizational functions, including finance, marketing, sales and management, need to see a corporate perspective of information for optimal organizational performance. However, the many local parochial systems that hold that information simply do not yield the integrated corporate view required. Enter data warehouses. A data warehouse is an apparatus that incorporates information from totally different sources in one central area (Arif & Mujtaba, 2015). Data warehousing stores petabytes worth of health data from the healthcare perspective, which boosts the quality of care down to the individual patient level.

Health data refers to any data that’s contained in a patient’s health record. This data may be obtained from notes inferred from the clinical process, clinical admission, diagnoses, staff records, medication, or the physician visit. Information comes in different forms; numeric, alphanumeric, or even images. Examples of numeric data include laboratory results, vital signs measurements, and any numbers of data, while alphanumeric data provides patient information, medical history, and any other alphabetic information (Elmasri & Navathe, 2011). Images may carry any information in image form, including histological, ultrasound, and radiological data.

Health Data warehouse

Typically, health care data warehouses are created by collecting and integrating data from different sources. A data warehouse is essential in healthcare because it supports decision making and analytical reporting. Typical data warehouses store patients’ information, but they can also be used for academic research goals. Data warehousing also facilitates consistency and relevance of healthcare data; overall, it enables faster decision making (Permana et al., 2017). In other words, data warehousing ensures that medical practitioners make decisions that are based on facts and evidence. A rich body of evidence also asserts that data warehousing facilitates cross-border healthcare delivery. However, this requires that data flow complies with national and international interoperability, legal and security requirements (Gavrilov et al., 2020). In Europe, for example, data flow must adhere to technical, legal, organizational and sematic interoperability.

A data warehouse enables healthcare providers to access medical data collected during the care provision process regarding the healthcare industry. The necessary data warehousing process consists of the following functions; cleaning, integrating, and consolidating data (Gavrilov et al., 2020; Permana et al., 2017). An excellent example of a data warehouse is an integrated Electronic Health Record (EHR).

Extract, Transform and Load processes

The ETL processes are three of the most critical data warehousing processes because they facilitate responsible data extraction and integration from multiple sources (see figure 1). In fact, without the three processes, it is impossible to use data for solving operational needs (Arif & Mujtaba, 2015; Roldan, 2017). The ETL processes are used to prepare data for warehousing. When used appropriately, the three processes that make up ETL can help in multiple ways, including;

  • Offering comprehensive and detailed historical information.
  • Enhances productivity.
  • Facilitates informed decision making and overall business intelligence.
  • Enhances data calculations, verification and aggregation.

It is important to note that modern ETL systems are constructed using cloud technology because it offers more flexibility in data processing and storage.

ETL Processes
Figure 1: ETL Processes
  • The data layer shown in figure 1 includes the original data source on the upper part. Original data includes files and different databases.
  • Data extraction occurs in the middle part of the data layers. The next step is all about moving data to the Data Staging Area (DSA) (see the left part of figure 1); the data is then cleaned and loaded to the data warehouse (Gavrilov et al., 2020).
  • Data is warehoused in data repositories, although it must go through the so-called Load part, which is located in transform parts and repositories.

Security and Privacy Issues Related to Heath Data

Health data needs protection from malicious intruders and software that might tamper with the authenticity of the information. Data security protects medical integrity and ensures that the data remains confidential (Puppala et al., 2016; Khan & Hogue, 2016). Safeguarding healthcare data means creating a set of processes that limit the accessibility of information by unauthorized users. Some of the standard methods used to secure data include; antivirus software, regular and continuous backups, encryption, and limiting external connections.

Data security and privacy

Healthcare data privacy is essential because most medical data is sensitive and personal (Khan & Hoque, 2016). However, with the rapid technological advancements in recent years, safeguarding clinical data has become a significant legal and ethical challenge that undermines the overall quality of care. There is also the challenge of finding ways of sharing data without compromising the patient’s privacy. According to research, many countries have set standards that govern the physician-patient relationship in a manner that protects data confidentiality (Puppala et al., 2016. Essentially such measures are governed publicly, and they are meant to protect the patient’s dignity by ensuring that any information provided to the healthcare provider remains confidential (Puppala et al., 2016). In case a patient’s information is leaked, the organization responsible for handling the information takes full responsibility.

Many researchers have proposed security measures to protect clinical data stored in data warehouses (Gosain & Arora, 2015). According to Gosain & Arora (2015), security issues associated with data warehousing are an inescapable part of data warehousing; however, there are various effective methodologies for securing data. Lastly, it is essential to note that overall, data warehousing is beneficial, but it has its fair share of limitations. At the very basic, data warehousing is associated with high initial costs. Also, integrating data in a central place means that the original data owners lose control over the data, which raises privacy and confidentiality issues. Recent technological advancements, such as biometric technology development, have made it easy to control data access (Poenaru et al., 2016). Control measures such as biometric systems come with high overhead costs; however, they are crucial in safeguarding data warehouses’ integrity.

Electronic Medical Record

A health data warehouse enhances electronic health records’ capacity and allows medical practitioners to oversee quality care provision down to the individual level. As already established, a health data warehouse is a system that integrates different sources of information into one central area. Data warehouses can also store petabytes worth of health data retrieved from Electronic Medical Records. Health organizations must realize that electronic health records are fair source of the total information that makes a difference and underpins a particular workload. Furthermore, a data warehouse is basic to making the foremost educated commerce and clinical choices. EMR, for the most part, has the fundamental integration inside the local area network and can be associated with the outside environment through the exchange of health information. In fact, there is no viable alternative to an EDW if you want to successfully use analytics to improve the cost and quality of care. By incorporating the EMR’s and other systems’ data into an HDW, creating an infrastructure that allows accountable care organizations (ACOs), healthcare systems, physician groups, and others to predict and manage patient care and improve cost and quality (Khan & Hoque, 2015). In reality, there’s no alternative system for health electronic data warehouse (HDW) in case you need to effectively utilize analytics to move forward and improve the cost of quality care.

By consolidating the electronic medical records and other systems ‘information into an HDW, medical practitioners can create an infrastructure that ensures health systems are accountable. Most HDWs depend on different source systems, which means that they have a transaction system or an information distribution center populated by at least two data sources. Most information distribution centers also incorporate enlistment, planning, and charging systems. In fact, information distribution centers may contain information from as many as 50 distinctive internal and external data sources in huge ventures. Information distribution centers also facilitate cross-organizational examination, which ultimately facilitates information dissemination over the entire clinical process. In other words, information distribution systems can be used to connect information from several sources that contribute to the complete continuum of care for an understanding, from birth to passing. As an example, an information distribution center might facilitate the investigation of information from an EHR coded and information from a charging system coded in international classification of disease (ICD) by conglomerating the critical components required for the investigation from each system, notwithstanding the phrasing used.


Arif, M., & Mujtaba, G. (2015). Taxonomy of Different Data Warehouse Architectures & Need for Optimized Data Warehouse. International Journal of U- And E-Service, Science and Technology, 8(5), 299-308.

Elmasri, R., & Navathe, S. (2011). Fundamentals of database systems (6th ed.). Addison Wesley.

Gavrilov, G., Vlahu-Gjorgievska, E., & Trajkovik, V. (2020). Healthcare data warehouse system supporting cross-border interoperability. Health informatics journal, 26(2), 1321-1332.

Gosain, A., & Arora, A. (2015). Security Issues in Data Warehouse: A Systematic Review. Procedia Computer Science, 48, 149-157.

Khan, S. I., & Hoque, A. S. (2015). Development of national health data warehouse for data mining. Database Systems Journal, 6(1), 3-13.

Khan, S. I., & Hoque, A. S. (2016). Privacy and security problems of national health data warehouse: a convenient solution for developing countries. In 2016 International Conference on Networking Systems and Security (NSysS) (pp. 1-6). IEEE.

Permana, K. A. B., Subiksa, G. B., & Sudarma, M. (2017). Design data warehouse for centralized medical record. International Journal of Engineering and Emerging Technology, 2(2), 47-51.

Puppala, M., He, T., Yu, X., Chen, S., Ogunti, R., & Wong, S. T. (2016). Data security and privacy management in healthcare applications and clinical data warehouse environment. In 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) (pp. 5-8). IEEE.

Poenaru, C. E., Merezeanu, D., Dobrescu, R., & Posdarascu, E. (2017). Advanced solutions for medical information storing: Clinical data warehouse. In 2017 E-Health and Bioengineering Conference (EHB) (pp. 37-40). IEEE.

Roldan, M. (2017). Learning Pentaho Data Integration 8 CE – Third Edition. Packt Publishing.

Find out the price of your paper