Developing an HRIS Data Coding Scheme

From IHRIS Wiki

Data quality, or “fitness for use,” is of paramount importance in a human resources information system (HRIS). Quality can be compromised if the data include duplicates, unnecessary variations, misspellings or omissions. If the data in the system are inaccurate, incomplete or out-of-date, the reports generated by the system will be of no value. Furthermore, if reports based on poor quality data are made public, they may generate a lack of trust in the entire HRIS. (For more information about data quality, please see “Data Quality Considerations in Human Resource Information System (HRIS) Strengthening,” (PDF) included in the HRIS Strengthening Implementation Toolkit.)

To prevent problems during data entry, the data should be categorized in a consistent, standard way. Data coding is the process of classifying data in preparation for later analysis. Before any data collection takes place, stakeholders and HRIS managers should spend time developing a coding structure to organize the data in the system. This data-coding structure should reflect the way the health system is organized in real life, such as how districts are organized within a country and how jobs are organized within a facility.

If existing workforce data are available for immediate entry, taking the time to develop a data-coding scheme may be perceived as an obstacle to forward momentum. However, an initial investment in organizing a system of data entry requires fewer resources and less time than does retrospectively correcting data-entry errors. In addition, data entered according to a standardized coding scheme are more likely to be immediately useful for creating reports and drawing comparisons at the facility and district level.

Using Drop-Down Menus to Prevent Data Entry Errors

Information management programs (like the iHRIS Suite) and spreadsheet programs (like Microsoft Excel) support the creation of drop-down menus for data entry purposes. A drop-down menu is a list of possible values for a field that appears when the user clicks on a cell in a spreadsheet or data entry screen. Once the user selects a value, it appears in the cell and the rest of the list is hidden.

Using drop-down menus has two noteworthy benefits. First, they reduce the time required to enter data from a paper personnel record or data collection form since they allow data entry clerks to select items from a list rather than type each response into a cell. Second, drop-down menus restrict the information that can be entered into each cell to only the choices provided, reducing the likelihood that errors will be introduced. The drop-down menus should only include the values, or codes, that conform to the pre-determined data-coding scheme.

Why Is a Standard Data-Coding Scheme Necessary?

To understand why creating an HRIS data-coding scheme is such a high priority, picture what the HRH database would look like if all of the information for two categories, such as Job Title and Department, were typed directly into a spreadsheet. Imagine that three people who hold the same job in a single healthcare facility (Nursing Officer in the Obstetrics and Gynecology Department) are asked to record their job titles and department names on a data collection form. It is possible that they would write down this information in three different ways, as in the example below.

Unique ID #     Employee Name     Job Title                    Department Name
101	        Nurse A           Nursing Off.– OBGYN          OBGYN
102	        Nurse B           Nursing Officer              Obs. & Gyn.
103	        Nurse C           Registered Nurse Officer     Obstetrics and Gynecology

Now suppose that all the health workers at the facility, even those who hold the same job, enter their information using different versions of the same job titles and department names. Perhaps some of the data entries are misspelled as well, while others are entered using unclear abbreviations. Technically, the data in the spreadsheet would be accurate, since each of the health workers wrote down a job title or department name that represented his or her job at the facility. However, because the data entry clerks did not have a standard way to code the data, the information in the dataset is difficult to analyze. For example, it would not be possible for a healthcare decision maker run a report to quickly find the number of Nursing Officers at the facility.

Creating a standard way to organize health workforce data allows users of the system to easily aggregate data about a specific variable. For example, after the implementation of an HRIS data-coding scheme, the information about the Nursing Officers at the facility would be entered into the spreadsheet as follows:

Unique ID #     Employee Name     Job Title           Department Name
101             Nurse A           Nursing Officer     Obstetrics and Gynecology
102             Nurse B           Nursing Officer     Obstetrics and Gynecology
103             Nurse C           Nursing Officer     Obstetrics and Gynecology

Use of a standardized data-coding scheme enables the system to automatically find how many Nursing Officers work in the Obstetrics and Gynecology Department at this facility and display that information in a report.

In addition, a standard classification system enables HRIS users to more easily compare information between facilities and districts. Classification systems that conform to international standards even allow for comparisons between different countries.

Creating a Data Dictionary

A data dictionary, also referred to as a codebook, is a written record of all of the codes used in the database and how they correspond to the data. Establish the data dictionary when the data-coding scheme is created. The data dictionary should be as clear and explicit as possible so that someone who has no knowledge of the codes can easily look up each value and find out what it means.

Log all of the different data points that correspond in the data dictionary, including additional names, alternative spellings and abbreviations. Update the data dictionary every time a new value is added or an alternate name for a value is discovered.

For instance, the data dictionary entry for the Nursing Officer example may read as follows:

Category/Field      Value               Alternatives
Job Title	    Nursing Officer     Nursing Off. – OBGYN, Registered Nurse Officer

The data dictionary may also include information about the different levels of a job. For example, is a Nurse Assistant I a more senior job than a Nurse Assistant II, or are these jobs at the same level but with different responsibilities? Any clarifications that may be important for data entry or useful during data analysis should be included in the data dictionary.

Eliminating Common Problems in Drop-Down Menus

This section describes common problems that are found in the lists of values used for drop-down menus in data entry tools. For reasons of simplicity, each problem is illustrated using examples from the Job Title field, but these problems can be found in any field in the dataset.

Unnecessary Variations

Sometimes, healthcare facilities or health workers use different titles to describe the same job. For example, the job called “Principal Medical Officer” in one health facility may be called “Chief Medical Officer” in another facility. Developing a standard list of job titles enables decision makers to compare data from different facilities and districts. Standardized titles are also helpful in distinguishing between the levels in a job. For example, defining the general roles and functions of a Nursing Officer II will help ensure that job titles reflect the actual job being performed, without relying on additional knowledge about the department or facility where the Nursing Officer II works.

Pre-existing lists of job titles often contain both the name of the job and the name of the department where the job is located. For example, a list of titles may include values such as “Physician (HIV/AIDS)” or “Manager – Finance.” In these cases, a decision must be made about how to categorize the job. If the skills and functions of a job in one department vary significantly from the same job in another department, retaining the name of the department in the Job Title list may be important for later data analysis. For instance, a manager in the Finance department may have a very different role than managers in other areas of a healthcare facility. However, the underlying skills and functions are frequently the same for jobs across departments. One can imagine that a physician in the HIV/AIDS department would have a similar role as a physician in another department, such as seeing patients, monitoring their care and prescribing treatments. In these cases, eliminating the department information from the Job Title field decreases the redundancies in the database, since this information is tracked in the Department Name field.


Duplicate entries can appear in a database for a number of reasons, including additional spaces, misspellings and abbreviations. Eliminating duplicates from drop-down menus is essential for maintaining data quality. If a field contains two values that represent the same information, such as Chief Nursing Officer and CNO, choose one of the values and eliminate the other, noting the eliminated value in the data dictionary. Ensuring that there is only one job title to describe each job reduces confusion during data entry and eliminates the need to re-code jobs after data are collected.


Some abbreviations are well known and can be used to conserve space in the data entry form. For example, most people would probably guess that the abbreviation “Admin. Officer” stands for “Administration Officer.” However, abbreviations should only be used when their meaning is likely to be clear to someone who is seeing the abbreviation for the first time. An abbreviation like “Med. Res.” could stand for more than one logical value in the list, such as Medical Resident or Medical Researcher—two very different jobs. Any abbreviation that could cause confusion during data entry should not be used. In addition, abbreviations should be tracked in the data dictionary, where both the abbreviation and the complete spelling of the word should be listed.


It is also important to include enough values in the job list to categorize all health jobs. To make sure jobs are not missed, it may be wise to pilot the tool in a few different types of healthcare facilities. Are health administrators accounted for? Are all part-time jobs listed? What about necessary jobs that are not directly related to healthcare, such as drivers, security guards and cleaning staff? The final data entry tool should include enough values to ensure that jobs are represented with sufficient detail to be useful for data analysis, but should not contain so many values that aggregating data during analysis becomes difficult.

How Do We Create a Data-Coding Scheme Without Prior HRH Information?

Most healthcare systems collect some type of HR information for workforce tracking and payroll management. While it seems unlikely that any country would need to create an HR coding scheme completely from scratch, in some cases (such as destruction of paper personnel files or loss of an HR database that was not backed up) it may be necessary to create an HRH database based on a very limited amount of initial information.

To create an HRIS data-coding scheme, begin by brainstorming the types of information that a healthcare stakeholder would need to have in order to make good decisions about the health workforce. For example, for payroll purposes, the stakeholder would need to know the names and addresses of employees. To make good staffing choices, the stakeholder would need to know about types of training, certification and licensure. To manage the workforce, the stakeholder would need to know what jobs are needed, as well as information about departments and facilities.

Once each of these categories, or fields (e.g., Employee Surname, Employee Address, Facility Name, job Title, Department, etc.), has been identified, think about which values would belong in each category. These values will be listed in the drop-down menus of the data entry forms.

You will be able to list all of the values for a few categories. For example, the category Martial Status will only have a few values in its drop-down menu: Single, Married, Domestic Partner, Divorced, Widowed, and possibly Nun/Clergy. The category Facility Name will also have a finite number of values, consisting of the names of all of the facilities in the district or country.

For a few of the categories, such as Employee Surname or Employee Address, so many possible values exist that it does not make sense to use a drop-down menu. These fields should be left blank on the data entry form. Data entry clerks will have to type a new value into each cell, rather than select a value from a drop-down menu.

The third and largest group of categories, such as Job Title and Department Name, will require a list of values for the drop-down menu in order to maintain data quality and consistency. However, creating a complete list of these values for a drop-down menu requires a strong knowledge of the healthcare system and input from HRIS stakeholders. To generate lists of values for these fields, it may be useful to refer to the resources listed at the end of this brief. While these resources may be valuable in the beginning stages of creating a coding system, determining country-specific values will require some research. A survey of all of the jobs in chosen local health facilities should provide a clearer picture of the types of job titles, department names, etc. that need to be included in the data-coding scheme. Input from key HRIS stakeholders is essential during this stage of coding scheme development.


The following resources may be useful when creating an HRIS coding scheme: