[ieee 2011 frontiers of information technology (fit) - islamabad, tbd, pakistan...

Trusted Framework for Health Information Exchange

Muhammad Afzal, Maqbool Hussain, Mahmood Ahmad Semantic Systems Research Lab

School of Electrical Engineering and Computer Science {muhammad.afzal,maqbool.hussain,

mahmood.ahmad}@seecs.edu.pk

Zahid Anwar Department of Computing

School of Electrical Engineering and Computer Science [email protected]

Abstract—Healthcare Interchange Exchange (HIE) is set of healthcare entities that link regions and nations around the world to share healthcare information about patients. HIE provides a unified interface to various stakeholders to access patient data. For example, individuals want to access personal data, practitioners want to import their patient data from Personal Health Records (PHR) and researchers want to access anonymized patient data for research purposes with ensured individual privacy. Keeping in view requirements of various stakeholders it is challenging to design an HIE system that ensures the security and privacy of patient information, such that the health records remain interoperable and scalable. This paper proposes, an HIE framework that will provide secure access to patient data and adhere to the interoperability rules. Implementation of l-diversity algorithm ensures individual privacy especially when the data is made available publicly for statistical analysis.

Keyword; HIE; Secure HIE;Interoperability;Anonymity;HL7

I. INTRODUCTION Health information exchange (HIE) is defined as the

mobilization of healthcare information electronically among disparate healthcare information systems while maintaining the meaning of the information being exchanged [1]. Healthcare is a vast domain having extreme heterogeneities in its different sub-domains which leads towards the need for a flexible interoperability solution. Meaningful use and secure HIE is the ultimate interoperability of healthcare systems to connect with other systems. Connectivity to various systems on the public Internet mandates security for HIE systems that means protecting the sensitive data from misuse by applying different security mechanisms ranging from authentication/authorization to encrypted storage. Trusted HIE is one step ahead of secure HIE where we create an environment for; patients to access their own medical records, physicians to access records of their patients and researchers to access anonymized healthcare data. Transformation of original microdata into anonymized format is to discourage linking attack and individual identification and privacy through more generalized expression of information while maintaining its utility for research and analysis. Additionally, trusted HIE requires proper access control, audit and logs. For interoperability, we need to adhere to some common formats or standards acceptable to communicating parties because information exchange requires at least two entities sharing data with each other. There can be a number of design choices to develop a trusted HIE system.

� How many different entities are present in the system e.g. PHR (Personal Health Record) [2], EMR

(Electronic Medical Record) [3, 4], EHR (Electronic Health Record) [3, 4] or HIS (Health Information System) [5, 6]? How to split up the various tasks among them?

� Who handles authorization across organizational boundaries?

� What mechanism to use to handle the identity across organizational boundaries e.g. Oauth [7, 8] or OpenID [8]?

� What kind of security/privacy considerations (Encryption/Decryption, Hash, Tokens, Public Key, and Secrets) do we have at various interfaces?

� What kind of mechanism to use for messaging e.g.Courier Service, Fax, Email, HL7 [9], etc.

� How effectively and conveniently transformation of microdata is possible into anonymized format.

In our approach, we are using PHR and HISP as two major entities as a communicating parties where HISP can be an EMR, EHR or HM IS. Secure storage of patient health records is performed in both PHR and HISP cases with public key infrastructure and attribute level encryption using METRO [10]. OAuth API is used for secure authentication and authorization. HL7 CDA [11] is employed for information exchange. To incorporate essence of anonymity the technique of �-diversity [23] has been implemented. The designed architecture takes care of secure publication of data where “secure” means; to hide the individual in final published data.

II. LITERATURE SURVEY Healthcare is a vast domain having extreme heterogeneity

in its various sub-domains. Most of the existing health systems are built in silos thus created issue of common design of applications as well as the underlined data repositories. At the same time, there is a hustle of terminologies used for health systems like PHR, EMR, EHR, HIS, CIS, HMIS, HIMS, HMS, DOM etc. Brief definitions of these concepts are provided in Section III. Such kind of vicinity of systems and the design differences created a great challenge for implementers of HIE systems Additionally, healthcare is one of the most challenging domains to incorporate patient data privacy and security considerations because of the tradeoff between conflicting scenarios e.g. (1) complete access to patient records required

2011 Frontiers of Information Technology

978-0-7695-4625-4/11 $26.00 © 2011 IEEE

DOI 10.1109/FIT.2011.64

308

by physicians in emergency scenarios versus (2) medical history of prominent personalities being published to the press. Existing systems greatly lack security and interoperability features. According to [12], 31% of surveyed hospitals report data security breaches. Similarly, Care teams cannot collaborate with patients and families because they cannot all view the same EHR. An estimated 98,000 deaths per year in the U.S. alone are due to preventable medical errors [13]. Also data remains missing, inaccurate, and non-standardized and the unnecessary duplication of tests, medications, and treatments adds to rising healthcare costs [12]. Due to lack of standardization the systems cannot integrate data from various sources—even those from within the same organization. There is a national push to achieve interoperability among various health systems thus introduced in the HIE.

HIE is a concept rather a physical organization. In USA, RHIOs (Regional Health Information Organizations) are formed to enable HIE. A RHIO is a multi-stakeholder organization expected to be responsible for motivating and causing integration and information exchange among stakeholders in that region's revamped healthcare system [14]. RHIOs are key to the US NHIN (Nationwide Health Information Network). NHIN [15] is a set of standards, services and policies that enable secure health information exchange over the Internet and is providing foundation for the exchange of health information across diverse entities, within communities and across the country.

Based on initial recommendations from the NHIN Work Group, the Direct Project [16] was launched to specify a simple, secure, scalable, standards-based way for participants to send authenticated, encrypted health information directly to known, trusted recipients over the Internet. The Direct Project itself does not run health information exchange services rather expands the NHIN standards and service descriptions available to address the key requirements for Meaningful Use, and provide an easy "on-ramp" for a wide set of providers and organizations looking to adopt. This project claims that there will be one nationwide exchange, consisting of the organizations that have come together in a common policy framework to implement the standards and services.

The Connect Project [17, 18] is the realization of NHIN standards. It is open source software solutions that enable HIE and is developed by federal agencies. It enables professionals to access their patient records by promoting the adoption of interoperable health IT systems among various health providers.

Oracle Health Information Exchange [12] is an integrated suite of products for HIE. It is an extensible, open, standards-based suite of products built upon a reliable technology infrastructure for the secure exchange of electronic health information. These products empower global healthcare entities to reduce costs, enhance revenues, and— most importantly—to improve patient care.

To improve security and its utilization at mass level with other health record databases, support for Connect and IHE

(Integrating the healthcare enterprise specifications) has been incorporated by Oracle HIE; however; these improvements and integration of other systems into Oracle HIE, are towards the overall expected requirements overlooking its post integration complexities.

HIE is becoming popular among other nations as well. Canada Health Info way [19] and UK National Health Service [20] are the country level initiatives to support health information sharing directly or indirectly.

III. SOME IMPORTANT DEFINITIONS

A. PHR: deals with individal patient records to manage and administrate. The records can be utilized by various EMR or EHR systems as per requirements.

B. EMR: tends to be a part of a local stand-alone health information system having no paper processing.

C. EHR: It is a record that is capable of being shared across different healthcare settings.

D. HIS: deals with administrative, financial and clinical aspects of a hospital (paper processing is also part of HIS). Note that CIS (Clinical Information System) is an old name of HIS and HMIS, HIMS, HMS (Hospital Management System) are the terms used as an alternative to HIS.

E. DOM (Doctor Office Managemt): is a system for doctorsat their offices. DOM can be connected with EMR, EHR or HIS system for sharing of data.

F. Microdata: The information related with patients and their complete records. Along with other attributes of information; microdata also contains key attributes like (Name, SSN etc).

G. Generalization: It is a process adopted during the transformation of microdata into anonimized information. Different strategies are adopted depending upon the type of information (integer and categorical).

H. Suppression: After the generalization process towards anonimity, final shape of information is still not guaranteed to protect user privacy for some record values. In this situation the entire row is suppressed.

IV. PROPOSED SYSTEM ARCHITECTURE AND METHODOLOGY

Proposed system comprises of various modules to fulfill collectively the requirements of HIE. In order to keep secure, at high level two requirements must be met;

• To properly manage authentication for accessing resources or publishing resources on HIE.

309

• Provide uniform encryption for published data on HIE so that only valid user can understand the contents. Moreover, parallel to securing HIE it is equally important to provide access to multiple stakeholders with heterogeneous systems. This requirement imposes high level of interoperability for HIE.

To cater to these requirements the proposed system is built on various modules. Following are high level descriptions of the system components depicted in Fig 1.

A. HLH PHR HLH PHR (Health Life Horizon: Personal Health Record)

is a PHR system that provides infrastructure for individual users to manage shareable data of common interest. It follows requirements of HL7 PHR specification in order to comply with international standards. Moreover, the underlying architecture is based on HL7 CDA (HL7 Clinical Document Architecture) that leads towards an interoperable solution. CDA is a document used to persist patient data like demography, history, discharged summary, lab reports etc. in a specific xml format. The document format is ultimately derived from HL7 RIM (Reference Information Model).

B. Keys Manager(KM) Keys manager is connected to keys store that keep

managing keys for encryption of PHR records. Moreover, it also manages authentication URLs to keep track on one time or multiple transaction per token release.

C. Authentication Manager(AM) Authentication Manager deals with authenticating various

categories of users with multiple set of interests. For example, it may authenticate practitioners to allow them access to a particular patient record. Researchers may be authenticated to provide anonymized access to a group of patients’ records for further analysis. AM is based on OAuth [7, 8] protocol that allows various stakeholders to be authenticated in order to publish or/and access various resources on HLH PHR. AM

uses KM in order to keep track of authentication tokens. The process of OAuth API is described as follows;

• Register your app (HMIS/EMR/CDSS) with the service that you are developing for (HLH PHR). You will receive a consumer key and secret.

• You, the developer of the app then initiates the OAuth process by passing the consumer key and the consumer secret

• The service will return a Request Token to you.

• The user then needs to grant approval for the app to run requests.

• Once the user has granted permission you need to exchange the request token for an access token.

• Now that you have received an access token, you use this to sign all http requests with your credentials and access token.

D. Transaction Manager (TM) Transaction Manager is responsible for releasing/accepting

resource contents to be published on HLH PHR for authenticated and authorized users. It works in collaboration with the AM in order to ensure authorized access to HLH PHR and manage transactions as per released token. TM also ensures secrecy of HLH PHR contents by using KM to encrypt and decrypt the required data. To ensure interoperability and make ease of integration with heterogeneous healthcare systems, TM is based on HL7 V3 messaging with HL7 CDA as its contents. HL7 V3 is the later version of HL7 standard and its messages are derived from HL7 RIM.

E. PHR Repository(PHR-R) PHR repository stores the published contents on HLH PHR.

PHR-R is influenced from HL7 RIM (HL7 Reference Information Model) and following structure of HL7 CDA to minimize integration efforts. This repository is properly encrypted in order to keep the data secure. The encryption

Data Publisher

Figure 1:Proposed System Architecture

310

techniques will be applied at attribute, section and document level as per requirement of the business.

F. External Interfaces External Interfaces are part of the architecture. These

interfaces will be exposed to the external world to provide patient data in a secure manner based on proper authentication and authorization. External systems like HMIS, EMR or CDSS system will only be allowed after passing OAuth authentication and authorization process to access to the patient data published through HLH PHR.

V. PUBLISHING MICRODATA Security and privacy of microdata has grabbed a lot more attention in recent years with the escalated trend towards utilization of publically available data for statistical purpose by researchers and analysts. The analysis on this data can predict useful results in different aspects depending upon its nature. If it is related with commerce, the analysis can reveal its impact on overall economy in future years or better services for health area in future can be formulated to save the humanity if it is related with health. Collection of health data from different sources/reservoirs is not trivial especially when it comes to its publishing. Making it available in its original structure (microdata) publically might facilitate the research work in full capacity; however; privacy of individual is totally exposed. Removal of key attributes (name, SSN) definitely gives an impression of anonymity at a first glance; however this technique is neither labeled as anonymity nor protective against linking attacks. The probability of linking attack escalates through other resources of published data (Voters list, Geographical survey etc) as few attributes fall in common. These common attributes like date of birth, gender, zip code or marital status are found in these types of information systems. Identification of individuals through linking attack has been explained in [29]. Out of two different sources of information; where one is anonymized and other is not; still 69% of individuals are not safe against linking attack and that too with just a combination of date of birth and zip code. Discouraging linking attacks and to preserve user privacy, numerous techniques have been proposed so far. In k-anonymity, each record is indistinguishable from at least k�1 other records with respect to their quasi-identifiers [21,22]. The technique of k anonymity has been found insecure against background knowledge of adversary in [23]. In succession to k-anonymity a further improved anonymity technique of �-diversity appeared where a group of indistinguishable rows must contain distinctive sensitive values [23]. Technique of �-diversity has been found privileged in contrast to k-anonymity where leakage of information is possible due to grouping of information and lack of diversity in sensitive attributes. Besides the aforesaid methodologies, (�,k)-anonymity [25], d (c,k)-safety [26]and

tcloseness [24] are few other proposed data privacy models. In all models, privacy is not the ultimate goal; utility of data goes side by side and is required equally for analysis; however privacy and utility falls at extreme ends. If privacy is in high demand, utility of data would be limited and vice versa. Similarly, possibility of inferencing and linking attack is high when degree of anonymity is less and otherwise as shown in fig-2.

Figure 2: Degree of Anonymity

To solve the issue of data privacy with published data set, we have used the approach adopted in �-diversity[23]. We performed our experiment with a collection of three hundred records as microdata. Preparation of data is possible with tools like benerator [28] available under GPL, which guarantees valid data generation. After observing a real sample of health data and its attributes, we created a tool that works the same way and can generate a valid microdata against given attributes. For experimental purposes; 300 entries were generated; however; generation of data (no of distinct rows) is not limited to this count. Step1: In initial step we seeded the tool with appropriate set of desired attributes (A) with (N) entries in each attribute. Number of required rows (R) are specified and it generated R number of rows such that Ri={A1,A2,A3……An} where (1� i� n). The table data has been classified into three categories

• Key Columns(Name, SSN) • Non Sensitive Information (Quasi identifiers) • Sensitive Information

At this stage, initial data (microdata) has been generated.

Example: We created a range of columns(attributes); Name, age, gender, SSN, disease, ethnicity and zip code. Each attribute, is filled with valid range data i.e, for age attribute, values of 22, 27, 29, 30, 34, 37, 41, 46, and 51,61 are given. The outcome of age value fell in the random distribution of minimum and maximum age value

(min�age� max).

Similarly all columns were filled with appropriate data values. After filling the initial dataset, the tool generated three hundred distinct entries against given values in each attribute. Addition of more columns further enhances the random distribution of column values in each row; however; for test purpose we opted for limited but

311

sufficient range of attributes. Grouping of columns is made in three categories

• Key Column {Name, age} • Quasi Identifiers {age, gender, ethnicity, zip code} • Sensitive Information {disease}

In next step key columns have been expunged, non sensitive information have been generalized and sensitive information is left as it is. Step2: The microdata created in step 1 passes through different iterations, suppression and grouping steps to achieve the required level of anonymity using �-diversity. A four anonymity table with distinct value for sensitive information is created. In each group of quasi block, values of quasi identifiers are identical against distinct values of sensitive information to withstand linking attacks. After anonymity of microdata ‘D’ each group satisfied the following axiom after suppressing the distinct rows.

�

��(1� i� n/4) : Gi D

For any two groups G1 and G2 the values of their quasi identifiers are distinct: Gi�G2

Example: For purposes of explanation, we picked up a few blocks of information (quasi blocks), other than the rows of quasi blocks there also exist few entries which stand distinct and do not fall in any block. In fig-3, the first twelve rows have been grouped as 4-anonymity table; however row 13 and 14 stand distinctive and do not fall in any other block. The situation has two options.

Figure 3: Anonymized Data (Before Suppression)

Option-I. In the first option we can further generalize the values of quasi identifiers; however; this practice can hide these tuples in the rest of the data; however;

its overall effect will lemmatize the utility factor by affecting majority of rows. Option-II In second option, suppression can expunge the anonymized data with rows that stand distinct. This technique is fast and effects limited number of rows. The suppression effect can be seen in fig:4

Figure 4: Anonimized Data (After Suppression)

Other than generating valid data, the tool also provides facility of generalization and its different levels for each attribute (Numeric and categorical).

VI. CONCLUSION AND FUTURE WORK Research in interoperability and security of HIE systems is

extremely important as these systems provide the capability to electronically move clinical information among disparate healthcare information systems while maintaining the meaning of the information being exchanged. The goal of HIE is to facilitate access to and retrieval of clinical data to provide safer, more timely, efficient, effective, equitable, patient-centered care. HIE is also useful to Public Health authorities to assist in analyses of the health of the population.

We discussed different health systems with different terminologies used for them. A number of initiatives to achieve HIE at national level are elaborated. We discussed our approach where we used HL7 CDA for interoperability and OAuth for authentication and authorization. We also implemented a famous anonymity technique in our solution.

In the future we plan to use attribute and element level encryption using HL7 Web Service Security Specification and OASIS specification. Similarly, different enhanced techniques can be devised for the researchers to retrieve health data in an anonymized way.

312

REFERENCES

[1] State Health Information Exchange Cooperative Agreement Program. http://healthit.hhs.gov/portal/server.pt/community/healthit_hhs_gov__home/1204

[2] Personal Health Record, http://encyclopedia.thefreedictionary.com/Personal+Health+RecordI, June 15, 2011.

[3] Dave Garets and Mike Davis, “Electronic Medical Records vs. Electronic Health Records: Yes, There Is a Difference,” A HIMSS AnalyticsTM White Paper, January 26, 2006.

[4] Dave Garets and Mike Davis, “ EMRs and EHRs: Concepts as different as apples and oranges at least deserve separate names,” Healthcare Informatics, October, 2005.

[5] Hospital Informaiton System, “http://en.wikipedia.org/wiki/Hospital_information_system,” June 15, 2011.

[6] Hospital Information System, http://www.biohealthmatics.com/technologies/intsys.aspx, June 15, 2011.

[7] Eran Hammer-Lahav, “OAuth, ” http://oauth.net/about/, September 5, 2007.

[8] OpenId and OAuth Hybrid Extention, http://wiki.openid.net/w/page/12995194/OpenID-and-OAuth-Hybrid-Extension, June 15, 2011.

[9] Health Level Seven, http://www.hl7.org, June 15, 2011. [10] Metro, http://metro.java.net/, June 15, 2011. [11] HL7CDA,http://www.hl7.org/implement/standards/cda.cf

m, June 15, 2011. [12] Oracle, “Interoperability: A Key to Meaningful Use,”

Novermber 2010. [13] “U.S. group sees little progress on medical errors,”

http://www.reuters.com/article/2009/05/19/us-medical-errors-idUSTRE54I6U320090519, June 15, 2011.

[14] Regional Health Information Organozation, http://en.wikipedia.org/wiki/Regional_Health_Information_OrganizationJune 15, 2011.

[15] National Health Information Netowork: Overview, http://healthit.hhs.gov/portal/server.pt?open=512&mode=2&cached=true&objID=1142, June 15, 2011.

[16] The Direct Project, http://wiki.directproject.org/, June 15, 2011.

[17] The Connect Project, http://healthit.hhs.gov/portal/server.pt?open=512&mode=2&objID=3340, June 15, 2011.

[18] Connect Community Portal, http://www.connectopensource.org/, June 15, 2011.

[19] Canada Health Infoway, https://www.infoway-inforoute.ca/lang-en/, June 15, 2011.

[20] UK National Health Service,http://www.nhs.uk/Pages/HomePage.aspx, June 15, 2011.

[21] P. Samarati, L. Sweeney, Generalizing data to provide anonymity when disclosing information, Proceedings of ACM Symposium on Principles of DatabaseSystems (PODS), 1998, p. 188.

[22] L. Sweeney, K-anonymity: a model for protecting privacy, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002, pp. 557–570.

[23] A. Machanavajjhala, J. Gehrke, D. Kifer, M. Venkitasubramaniam, l-Diversity: privacy beyond k-anonymity, Proceedings of the International Conference onData Engineering (ICDE), 2006, p. 24.

[24] N. Li, T. Li, t-Closeness: privacy beyond k-anonymity and l-diversity, Proceedings of the International Conference on Data Engineering (ICDE), 2007,pp. 106–115.

[25] R.C.-W. Wong, J. Li, A.W.-C. Fu, K. Wang, (�, k)-anonymity: an enhanced k-anonymity model for privacy-preserving data publishing, Proceedings of ACM

[26] International Conference on Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD), 2006, pp. 754–759.

[27] D.J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, J.Y. Halpern, Worst-case background knowledge for privacy-preserving data publishing, Proceedings of the International Conference on Data Engineering (ICDE), 2007, pp. 126–135.

[28] Performance test data generation tool, http://databen.org/databen-benerator/installation.html Pierangela Samarati, Latanya Sweeney, Protecting Privacy when disclosing information : K-Anonymity and its enforecement through generalization and suppression, IEEE Symposium on Research in Security and Privacy, 1998.

313

[ieee 2011 frontiers of information technology (fit) - islamabad, tbd, pakistan...

Documents