In 2015 and 2016 I analyzed the health care data for a system containing 35 billion rows of immediately useful tables and data. These data are liked to 11 million patients, tens of millions of visits.
The following are the reclassification or recoding algorithms that I developed to mine the data pulled from the databases. They are being used to fully and more effectively monitor most of the health care processes, using spatial analysis techniques, sql, SAS, and a geographic information system.
[The year I essentially completed this task during the 7/15-12/16 period is in the brackets. Those with an asterisk underwent some levels of data manipulation, format or content cleaning, reclassification, and/or development including in the algorithms.]
- Demographics, patient population [7m patients; name; DOB; personal ID; insurance program ID, insurance name, type, company etc.; race, ethnicity (unfortunately often race and ethnicity are entered as one column)] 
- Demographics, regional [census based, census block or block-group preferred; last census, most recent year for estimated counts if available] [2015 – ]
- *Standard Age-Gender [population pyramid level data assemblage for 1, 2, 5 and 10 year increments, for producing population pyramids of demographic features.] 
- *Standard Race focused interpretation of patient data, including Age-Gender pyramids produced by correlation with Age-Gender dataset. 
- *Standard Ethnicity-focused interpretaton of patient data, including Age-Gender pyramids. [Note: Race and Ethnicity often don’t correlate well and produce aberrant groups whene used together.] 
- *Religion [150 to 10-12] identification and reclassification routines (I have my classification system posted elsewhere in several places, but focus on major religious groups, followed by philosophically defined subgroup types of the minor groups.) The following method most often used: Catholic, Christian (Trinity, Methodist, Episcopal, Baptist), Christian Derived (Seventh Day, Mormon, some Universalists), Judaic (Torah), Islamic (Koran, Unani, Rastafarian), Cultural (mostly the several Oriental and Indian), Natural Philosophy (Universalists, Quaker, Shaker, or similars), Modern (Practical, Agnostic, Atheistic), Post-Modern (Christian Science, Unity, Transcendentalists, Pagans), Other (Interdenominational), No Data or Unknown. The first five are classical groups, the remaining may be grouped as Other and Unknown. [2015 – 2016]
- *Insurer  –focuses on Commercial (COM; all the commonly competitors), Medicare (MCR), Medicaid (MCD), combined MCR-MCD, CHP (optional), Metropolitan (MET: large areas may provide a cross population plan where just place of residence defined eligibility), Government (GOV; Fed., State, County, Town insurance plan), Military (MIL), Veteran’s (optional or included with Military), Homeless (HOM), Self-insured (SELF), Union or Contracted Worker Coverage (UNI), Drivers License-accident-related Coverage (DL); Inherited or Annuity-linked coverage (INH), NIOSH-WTC/OSHA/Worker’s Comp coverage (WC), Special Programs (Coal miners, RR or shipping industry etc, SPE). 
- *Cost (esp. Virtual and True) – Virtual cost is the value of a procedure, event, visit, lab, etc. done on a patient. Cost lists or “theoretical prices” can be exceptionally lengthy, although not as long as Procedures in general. They may also be constructed as group costs and later as more detailed costs, by applying averages for true costs across the system, of true costs for subgroups of procedures and other items priced. True cost is what is charged to payers (patient or insurer). Virtual cost is an artificial value or estimate assigned to a procedure, as a guess, guestimate, or otherwise calculated value that applies to true life care situations. Institutional Cost is a cost which is assigned to a process, step or action that uses a systems approach to the analysis; it defines a cost based upon true costs and stresses placed upon the infrastructure for a healthcare system; for example, patients are normally not changed for specific events during a visit, like the price of the educational material provided, or the documentation in writing taken by a nurse using a special reporting sheet, yet each of these count as part of a typical office visit process; in theory the price of the office visit may be divided by the numbers of these non-charged events that occur at a typical visit. There are more than a million of these “procedures” coded in a detailed system. The standard is to not change patients for these individual procedures because they are considered part of the service. [2016 – ]
- *Cost Burden –temporal reviews of people demonstrate changing forms of coverage over time. At some periods in life, an individual may have just one health care coverage, such as COM provided by the employer, followed by MIL and then SELF and then HOM, then COM, and then GOV, MET, UNI, and finally MCRMCD. Independent non-duplicated scores or indices are attached to each type of coverage, which can be logarithmically related to cost/risk (using base 10 or less) when compares with a cost amount developed as another Price Sheet (virtual or true). [2016 – ]
- *Visits [inf.] – A visit is when the patient shows up for some reason. A visit may have a direct disease or health state linked to that visit, such as assess for possible pregnancy, where physician decides to also engage in annual visit events, such as check vitals and give seasonal shots, and then reassess if any immunizations due, or renewals of unrelated drugs required, or referral to other specialist required, or annual exams rescheduled. Each of these procedures is linked to that one event, and will usually related to conditions and diagnoses (ICDs) linked to that visit. Thus a visit for one condition may have several ICDs attached to it, and numerous procedure identifiers. Several recoding and reclassification processes are currently being tested. *An integrative (multicolumn) NLP method is being tested as part of this process. 
- Visit Types  – Visits may also be differentiated by type of visit, such as routine visit, emergent/urgent care visit, outpatient walk-in visit, inpatient hospitalization visit, referral visit, office (admin) visit, pharmacy visit, etc. 
- *Initial Activities/notes [0.5M] – activities are events that happen at each visit, such as check in, personal med hx sheet, initial interview for why patient is visiting, patient’s two mental health opening questions, main medical concern, history, etc. etc. These activities are often entered as discrete activity events that occur with a patient care visit. For inpatients, there may be standard floor nurse pulse taking events for example, followed by a special services provider visit, who also registers the vital signed separately. Activities are events that occur at discrete moments of time in patient health, and may be used to evaluate care process and sequences, relative to time passage. Several recoding and reclassification processes are currently being tested. *NLP methods will probably become a core part of this process, and are being tested. [2016 – ]
- *Provider activities/notes and actions [100k] — in terms of content and nature, these are similar to the initial intake activities just described; however, differentiating the care related decisions made and actions are taken are two of the primary purposes of this documentation activity. Another way to visualize these actions is to view the initial intake activities as documentation provided before seeing the NP or PA. These “notes and actions” are produced by the provider and are activities that usually occur in the examination room. Naturally, clinician’s decisions are made based upon a different set of findings and events than events related to intake processes. There are however overlaps, such as physician providing the patient with an educational packet due to a question raised. *To evaluate these data, classes are still in the early phase of development. Ultimately NLP analyses will need to be developed as well. [2016 – ]
- *Observations — a physician documents various observations, findings and conclusions as part of the healthcare process. Observations are document for such things as vital signs, responses to orally administered survey questions (mental health evaluation), findings on a new form the patient is asked to fill out (genetic screening questions, domestic violence, chronic disease questionnaire). These data may be evaluated for reasons related to completion, form quality and follow up to care. *NLP methods will also become a core part of this process. [2015 – ]
- *Procedures [95k] – definition varies across systems, but these are the processes a patient goes through for specific reasons, like diagnoses, labs, xrays, assessment, scanning, recording, etc. The renewal of a prescription may be covered as a procedure, although Rx data is typically reported as its own unique dataset and series of events. These may be differentiated by visit types. *Most of this has been successfully reclassified and recoded for multiple applications. [2015 – ]
- *Outcomes/Results [inf.] – procedure related outcomes. Whereas procedures bears the name of the process, this data set is where the results are kept. For example, a 20 measure blood test has Blood test under Procedures, using one or more identifiers with each identifier describing slightly different types; this entry of the blood test (where, when, etc.) has its outcomes entered differently in the Outcomes/Results dataset, so a 20 metric procedure has 20 rows of data. As another example, the Xray can have multiple opinions documented (MD, assistant, technician, NP, residents, Manager), with one opinion per row, and for well planned systems, the final accepted diagnosis identified and so marked. These have multiple atomic, descriptive and observational/outcome datasets (1 row per patient per event). *Due to the nature of the notes provided, NLP is required. [2015 – ]
- *Visit Activity Groups  – Various activities can be clustered into particular forms of knowledge acquisition or physical and mental needs that apply to a each and every activity. Some knowledge acquisition may be most relevant to long term quality of life issues, or behaviors, or allied health histories. An example of an activity group is nutrition, in which all the basics of nutrition are made reviewable using this additional identifier. This dataset also applies to special services provided by individuals whose service focus on this topic, such as nutritional counseling. There may be separate specialty groups linked to this identifier, such as unique clinical service groups of data, referred to by this column, that when visits provided even more highly detailed non-structured data about a patient’s medical state and history. *This grouper provides additional services for queries because it simplifies the filtering and mining processes. [2016 – ]
- *Ratios — Three ratios are defined for this group as essential metrics: Visits:Patients Ratio (VPR), Procedures:Patients ratio (PPR), Procedures:Visits Ratio (PVR). 
- *Risk Indices or Scores. Population health frequencies, incidence, prevalence. Demographic counts for each. Elixhauser, Charlson, and Federal Chronic Disease Indices. [2015-2016]
- *ICD Groups. ICDs groups should be managed as ICD9 and ICD10. If the latter is the only method use, some grouping is required for reliable evaluation processes to be developed. For ICD9 the n=about 14,500; for ICD10 the n=about 65,000. Historically, using ICD9, I was able to develop the following sets of groupings. These can be correlated to the ICD10. [2015 – 2016]
- n=750 about
- n=1000 (integer ICD)
- n=1275 (ICD, E and V codes)
- *Special ICD groups. Specific ICD code groups were defined for the projects. These groups include ICDs or ICD ranges/content, and groups labels for the following common special studies topics. These groups can be roughly defined using the population pyramid modeling processes developed. In another presentation, the tendencies for diseases to demonstrate distinct pyramid forms in relation to age and gender may be used to better test and cluster these disease patterns. Diabetes for example shows a progressive increase in counts in the population curve for both genders; schizophrenia demonstrates an asymmetry for Male versus Female with earlier adult age of onset; atrial fibulation demonstrates onset beginning at about 45 years of age and increases in number and frequency as the population ages, whereas diabetes cases reduce due to directly and indirectly-linked causes for mortality. For each of these Special groups, the ICDs, V codes and some E codes may be included in the algorithm developed. [2016 – ]
- Infectious Diseases (lower ICD)
- STDs and HIV
- HIV, related complications and comorbidities
- Cancers, by metastases patterns and organ system
- Old Age ICDs
- Prenatal-Postpartum ICDs
- Foreign Born in-migrating diseases (incl. vectored, zoonotic)
- Country or Region specific in-migrating behavioral, physical and socicultural ICDs (culturally-bound vs. culturally-linked)
- Genomic diseases, all, or by system
- Neurological Genomic or Developmental
- The Mental Health Early Onset case ICDs
- Early Onset High Fatality Diseases and ICDs
- Broad Range Genetic, Genomic and Congenital/ Development Diseases
- Chronic or Long term QOL Diseases
- Old Age Onset diseases
- *Spatial Modeling Lat-Long Datasets, point data. i) Patients: developed for Zip Code (3- and 5-digit), Census Block or Block group (tracts are not used), and based upon need, per patient address (top security). Used to develop spatial modeling analyses and presentations. ii) Facilities; iii) regions, towns, boroughs, etc.; iv) allied health or other healthcare/managed care associates locations; v) complementary alternative health care provider data (if possible)
- *Spatial Modeling Areal Analysis datasets, areal definitions and data points. Regional boundaries and areal centroids (borough, county, township, town, village). Functional, healthcare service related areas (Theissen polygons); equal area polygons.
- *Spatial Modeling, Transformative Areal Models and Grids (political and economic grid mapping). Traditional square grids and hexagonal grids(?). Several modeling protocols for each.
- *Spatial Modeling Auxiliary data. Additional data gathered and developed for transforming spatial model into useful modeling or predictive tool, spatial analysis tool, interventions driven programming tool.
I developed algorithms for a number of these datasets in this new system a year or more ago, about a third of them perhaps.
Another third have been perfected a little bit more this past year, each tested several dozen times and integrated with all of my normal data pulls for internal review of validity of reliability. Some of these I even added new columns to, after producing a more helpful reclass tool. The reason for these reclassification systems was to take full advantage of their research potentials and applicability to grant writing, prototype development, the investigation of newly discovered relationship.
The final third of database development projects are ongoing, and have been active and producing over the past year or two, but are obviously in need of more development so they may be integrated with the other tools I developed.
The details of the projects developed for the upcoming months are briefly defined on another of my blog sites. These versions of two or three of the previously defined projects are a little more defined, and solid in their methodology and constructs, that similar ones posted one year ago.
There are also a few new topics added due to my review of the 1800 grants available this, posted about two weeks ago. I determined what the hot topics are for this upcoming year, and from that chose the grants that were not due in the near future and which emphasized research approaches and topics related to the demographics I am dealing with.
Over the past year, my work has focused on producing as many new algorithms as possible, reclassification and analytic routines, and testing the algorithms, math and their outcomes. Now that these formulas heave been pretty much validated (with the exception of the cost-burden analysis method, which is still in it final stages of multiple use QA review), they will be tested in several grant-funded project (assuming grants are awarded), as well as several other very fitting applications of their use. These other items are also detailed on the other page.