SDMX-HD Data Export -- Kenya
Background Information
iHRIS and DHIS are established systems in Kenya. The Ministry maintains a facility list, the Master Facility List (MFL). There are approximately 7000 facilities. In addition iHRIS is considered the canonical source of the job list, of which there are approximately 300. In the case of Kenya, it was decided that exporting data at the level of job provided too much granularity. It was decided to create a "Job Group" of approximately 20 groups to categorize the jobs. This list was defined by the Ministry, but will be maintained in iHRIS. At this time, no gender disaggregation has been requested by the ministry.
Expected Data Use Scenarios
Export (#hws, job group, facility)
Where's the code
There are two code branches in use:
- The general interoperability tools which assist in producing the DSD, and other various xml files for various input sources (spreadsheets), can be found at lp:his-transform-tools
- The data lists and needed transforms for Kenya can be found at lp:kenya-sdmx-hd
Access control is via the HIS Interoperability team
Code Structure
There are four main directories in lp:kenya-sdmx-hd. In lp:lp:his-transform-tools there is a PHP script "runme.php" which processes these four directories according to the following logic.
- inputs: this contains a series of Linking the Data files which are the data lists. As an intermediary step, runme.php produces a file lists.xml which converts the .csv or excel spreadsheets into a simple xml file for further XSLT processing.
- transforms: The files here are used to generate the DSD, the xsd's and what other xml based files needed by the various systems.
- transforms_dsd. This directory contains the XSL which will operate directly on the DSD.
- outputs -- this is where are the results are.
Note, runme.php of this is setup to to work without any assumptions about about which systems are involved. It can happily handle openMRS, iHRIS and DHIS all reporting on similar but not necessarily identical data.
Linking the Data
Data lists are linked between the various systems by the .csv files in the inputs directory. For example in inputs/facility.csv you have the columns:
- dhisid: the id used in DHIS for the facility
- dhisname: the name used by dhis for the facility
- ihrisname: the name used by iHRIS for the facility
- ihrisid: the id used by iHRIS for the facility
- sdmxhdid: the id used for sdmx-hd id. for now it is simply the DHIS id as DHIS cannot handle (I think) aliasing of ids yet.
- comments: a place to keep track of the data linking process. for example indicate where you are not sure if the linkage is correct. we also indicate here that there are facilities in iHRIS which are not in DHIS -- this may be OK: for example the MOH Headquarters would not have any service data.
lists.xml
As an intermediary step, runme.php converts the .csv files into one large .xml file for processing. It has the structure as defined SDMX-HD Data Export -- Zanzibar#lists.xml here
The DSD
This is generated from lists.xml via the file:
- transforms/DSD/DSD.xml.xsl
Schema
The DSD will define two KeyFamilys. The validator for exports via CrossSectionalDataSets is produced via:
- transforms/schemas/KF_HW_JOBGROUP_FAC.xsd.xsl
- transforms/schemas/KF_HW_JOB_FAC.xsd.xsl
iHRIS
All the transforms and setup files are maintained in transforms/iHRIS. The results are in outputs/iHRIS.
iHRIS Installation
You can follow these instructions to get the HIS Intereoperability tools for Kenya: <source lang='bash'> cd /var/lib/iHRIS/ mkdir -p interop sudo chown `whoami`:`whoami` interop cd /var/lib/iHRIS/interop bzr branch lp:kenya-sdmx-hd sudo ln -s /var/lib/iHRIS/interop/outputs/iHRIS /var/lib/iHRIS/lib/4.0.16/kenya-interop </source> Note, you should adjust the /var/lib/iHRIS/lib/4.0.16 path in the last line according to your installation.
DHIS2
Issues to Address
- Unlike in the proof-of-concept for Sierra Leone, where we provided the DSD, the DSD was generated a DSD off of data coming from iHRIS and DHIS. This presented some challenges on our side but all of which can be worked around and improved upon. All data lists are in the inputs sub-directory.
- A dxf import file is created by transforming the DSD to import the "jobs" dataelements into dhis - doctor, nurse etc makingsure they all were part of an iHRIS-Staff data element group. It makes sense that the iHRIS system would have the authoritative list of these.
- Rationalizing the orgunits is really important and potentially quite difficult with a large number of them. We cannot risk overwriting or corrupting our dhis orgunit hierarchy so these must be agreed upon first. There are a few possibilities here:
- Ideally the codelist for facilities should probably be maintained by a 3rd system or one or other systems deemed authoritative.
- In future implementations, we can have dhis act as the authoritative reference - ie. start the process by exporting dhis orgunits and compare with what is in iHRIS. Fix iHRIS and/or DHIS to
make sure these are matching.
- iHRIS does not care about the org units as represented in iHRIS and reports out only on the most granular level in common with DHIS.
- iHRIS can already maintain distinct hierarchical relationships among the same data. For example, we do so with the geographical data with the Christian Social Services Commission as they need to organize by both under the administrative groupings as well as the diocese. If needed/useful we can readily import the DHIS hierarchies into iHRIS.
- In Zanzibar we agreed to report job dataelement disaggregated by gender (Male, Female or Unknown). For the moment the sdmx dsd is using the DHIS codelist values for these which shouldn't strictly have to be the case but it was easier for me this way for now. It's not really critical, but it should be improved in that involves some manual fiddling at the moment.
- Currently we have a constraint on the naming of the keyfamily used in the DSD. This is historical. We name it things like KF_345 where 345 is a categorycombo in dhis. That's fairly ugly and also should and will be improved, but it is not a showstopper. Ideally it would be something like KF_HW_BY_FAC_JOB_GEN or KF_HW_BY_FAC_CADRE_GEN. In any case something a bit more descriptive.
- For historical reasons, we are only currently importing monthly periods. We should generalize that to support other sdmx period types - such as quarterly.
The question of authority for codelists is most important. In the absence of an authoritative 3rd party, the DSD (structural metadata definition) must be created with some peer-to-peer collaboration between two or perhaps more systems. In our case iHRIS provided the jobs dataelements and we provided the orgunits and the gender disaggregation codes. We obviously want to reduce that as far as possible, both to make scenarios more easily replicable and to better enjoy the advantages of having a standard. But we are on the way.
Provided we don't have major headaches with incompatible facility lists (Kenya for example has many more than Zanzibar - 7877) we will get the data exchange working and hopefully improve a bit of process along the way.