
Natural language processing (NLP) revealed the occurrence patterns of these concepts in EHR narrative notes, which enabled selection of informative features for phenotype classification. Comprehensive medical concepts were collected from publicly available knowledge sources in an automated, unbiased fashion. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts.

Yu, Sheng Liao, Katherine P Shaw, Stanley Y Gainer, Vivian S Churchill, Susanne E Szolovits, Peter Murphy, Shawn N Kohane, Isaac S Cai, TianxiĪnalysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources. The AKG system is capable of defining knowledge bases in formats required by various model-based reasoning tools. The procedure has been designed to be general enough to be easily coupled to CAD systems that feature a database capable of providing label and connectivity data from the drawn system.

The resulting system, referred to as automated knowledge generation (AKG), uses an object-oriented programming structure and constraint techniques as well as internal database of component descriptions to generate a frame-based structure that describes the model. The concept of accessing computer aided design (CAD) design databases and extracting a process model automatically is investigated as a possible source for the generation of knowledge bases for model-based reasoning systems. Towhidnejad, Massood Mckenzie, Frederic D. Automated extraction of knowledge for model-based diagnostics
