Speech recognition

From Clinfowiki
Jump to: navigation, search

Speech recognition refers to software that takes human speech and translates it to text for data processing.


The notion of speech recognition for use in medical documentation has been around for years. However, in the past five to ten years it has really found its place in the healthcare community. It is no longer in the experimental stage, but has become a widespread tool that is used in many inpatient and outpatient settings. One area that has found success with the adoption of speech recognition technology is radiology.

Types of Speech Recognition

Front-end Speech Recognition

This technology allows for the dictator to see his dictation appear on the computer screen as he speaks. The thought behind this is for the provider to view his report as he dictates and edit as he goes. When complete, he should have reviewed his report for accuracy and completeness, correcting any errors before signing it and sending it to the electronic health record (EHR).

Back-end Speech Recognition

This technology requires the intervention of a second party for editing, generally a medical transcriptionist. In this model, the provider dictates as he always has and sends it off to the medical transcription department. The process is not any different for the dictator than it was when it was being transcribed. The difference appears to the transcriptionist. Instead of listening to spoken words and typing them as she goes, she receives a typed document on her computer screen, which she edits while listening to the dictation.

Advantages to Implementation

There are two primary advantages to speech recognition technology, cost savings and turn-around time for the finished product. Cost savings: There are substantial savings to be gained through the implementation of speech recognition technology. The front-end model provides for the greatest savings, since it completely eliminates the services of medical transcription. The provider dictates and edits his own documentation, then immediately sends it to the EHR. Back-end speech recognition also provides savings over traditional dictation and transcription. However, because it still relies on editing by the medical transcription department, the savings are not as substantial as those received from front-end technology. The cost savings realized through back-end technology are due to the increased productivity of the transcriptionist, which can be as great as 100% or more. Turn-around time savings: Getting documentation into the EHR as quickly as possible is important for patient care. Having essential information available can impact treatment decisions. Both forms of speech recognition allow for faster turn-around time, with front-end technology being the best in terms of shortening the time between dictation and availability in the patient record.

Barriers to Implementation

As studies have shown, this technology offers a high degree of accuracy, proven return on investment, and it is widely available from a variety of vendors. None of these factors are barriers to implementing speech recognition in any setting. The primary barrier is behavioral. Physicians still believe that it will take longer to complete a report through speech recognition than it does through traditional dictation. Showing providers the many tools that are available will help overcome this hesitancy. For example, available technologies include trigger-word normal text blocks and voice-driven forms and templates, which can save a great deal of time for the dictator and provide a high level of accuracy. Another time-saver is a sister technology to speech recognition – natural language processing. This technology, in conjunction with speech recognition, will allow physicians to query the software’s data base for specific information about the patient, such as allergies or related problems. This is much faster than the traditional method of looking through a paper chart.

Keys to Effective Implementation

One of the major factors for successful implementation of speech recognition in an organization is having strong support from organizational leaders and having physician champions who are committed to the value of this technology.

Another important element to effective implementation is helping providers set up a library of report templates and normal-text inserts to increase their efficiency and speed of adoption. Finally, and perhaps most importantly, it is imperative that there is an effective and comprehensive training program for providers to learn how to use the system, along with documentation for them to have available for future reference. Once training has been accomplished, it is also important to have support staff available to answer questions and help provide encouragement throughout the initial learning phase. Speech recognition is already playing a large part in the process of healthcare documentation. With time, the technology will continue to improve and produce even greater benefits in terms of cost and time savings. It will make it possible for documentation to be available for patient care much faster than before, and at much less cost to the organization.


Fallati DT. The current state of speech recognition: despite high accuracy and growing interest, behavioral barriers remain. [Online] 2006. [Cited 2011 Nov 19]; Available from: URL: http://library.ahima.org/xpedio/groups/secure/documents/ahima/bok1_030980.hcsp?dDocN

AHIMA Practice Brief. Speech recognition in the electronic health record [Online] 2003: [Cited 2011 Nov 19]; Available from: URL: http://library.ahima.org/xpedio/groups/public/documents/ahima/bok1_022107.hcsp?dDocN

Submitted by Jeanette Judd


Speech recognition is achieved by taking your voice (via a computer compatible microphone), and converting it to an analog signal. This analog signal is then sent to the computer’s sound card. An analog-to-digital converter takes the signal and converts it to a stream of digital data. At this point, the software then does the translation. [1]

In the context of Health Information Technology, this is used as an alternative to traditional transcription services. The traditional transcription workflow is when a Provider uses the telephone, or other dictation device, to dictate a report, have it typed by a transcriptionist, and then routed back to him or her in a computerized system for additional editing and electronic signature.

As is obvious by the description, there is an inherent delay with this in a clinical setting, and this delay for critical information can have an adverse impact on patient care. In addition, there can be obvious cost savings when the Provider directly edits their reports.


There are two possible speech recognition solutions, both of which have improved turnaround time and savings.

Front-end speech recognition

where a computer program does this immediately, and presents the data output to the Provider for immediate correction and signing. Front-end recognition has the advantage of minimal turnaround time; however, it also requires the Provider to implement significant changes to their clinical workflow.

One area where this is problematic is in terms of support. The software needs to be installed on any local computer where it may be used, and cannot be emulated or be a thin application. This makes this solution a highly distributed model, and resource intensive for HIT analysts.

Back-end speech recognition

refers to a process where the voice data is converted into text by a server based solution. This is still sent to a transcriptionist, and then routed back to the Provider for final signature.

However, back-end speech recognition still captures savings and efficiencies, because the document that is routed to a transcriptionist is already in a state where it is highly edited. In addition, because this is server based, it is considered a centralized solution, with significantly less HIT support overhead.

There are a variety of speech recognition vendors on the market. The intent of this article is to give an overview of the technology, but not analyze, rate or recommend specific vendors. Any excellent report doing so was just published by KLAS. It is called Speech Recognition 2010: Vocalizing Benefits. It is available for purchase on their website: http://www.klasresearch.com


  1. Maistkowski, S. (2000). How it Works: Speech Recognition. PCWorld
  2. Guerra, A. (2011). Speech Recognition Aids Physicians, Cuts Costs. Information Week.

Submitted by Brad Sinner

Speech Recognition Technology and the EHR: The Future of Data Entry?


The dismal adoption rate of the EHR in the United States (about 20% overall) is multifactorial. One of these factors may be related to the difficulty related to human-computer interface and the ability (or lack thereof) to easily input data and navigate the system. Speech Recognition Technology (SRT) may have a role as an increasingly viable solution to help improve EHR adoption rates by improving clinical documentation efficiency and quality.


A basic assumption regarding the implementation of the EHR is that the technology will improve patient care and that it will not get in the way of time-tested clinician work-flows or increase the time burdens of an overloaded medical staff. Some clinicians have argued that narratives are essential to a patient’s episode of illness, and that poor communication is more often detrimental to patients than lack of knowledge. Additionally, computers should enable clinicians to capture narratives easily, and the structure of the patient’s record strongly should enhance the ease of clinical documentation and information retrieval.

Usability is clearly a critical factor on the front end success of EHR clinician adoptability in its early implementation stages. Given its affordability and relatively minimal training requirements, SRT has the potential to improve efficiency, work-flow processes, and increase the quality of patient care documentation, as well as reducing transcription costs.

Why use SRT?

As EHR implementation becomes more widespread, handwritting for clinical data and order entry will become obsolete. For most clinicians (who are usually not expert “touch-typists”) , marginal typing skills can lead to self-editing while trying to document a narrative, therefore ultimately compromising the completeness and accuracy of patient provided information.

Dictation with transcription is more likely to result in a legible and comprehensive document, but still requires a qualified transcriptionist to perform the job with high fidelity; even so, the turnaround can vary greatly, transcription errors can result in propagated inaccuracies (or blank information) in the medical record leading to delays in posting the final corrected and authenticated note on the EHR. Entry of coded data is unnatural, could be cumbersome, awkward, and time-consuming, and may not convey an accurate meaning of the patient's problem or condition, which may reduce the ability to precisely communicate complex nuanced clinical information among clinicians or to patients.


As a result of dramatic improvements in SRT accuracy rate and ease-of-use, its demand has been growing steadily among physicians in search of tools to improve both work-flow processes and quality of documentation. Other benefits include reduced transcription costs, faster per-dictation turnaround time, increased accuracy and error reduction, and fast electronic capture of free-form text on complex cases beyond the scope of traditional EHR templates. Extremely complete medical vocabularies, acceptable accuracy rates, and voice macro creation capabilities (which ease and standardize complex documentation entry by the use of easily-editable templates or sequential order entry mouse/keyboard-driven navigation) have been lauded features of SRT applications that have enhanced their adoptability.

Risks and drawbacks

“Stylistic preference” may be a challenge for initial adoption. In general, clinicians may be initially unwilling to change their data input style, the main issues being how comfortable one is with the concept of speech recognition as a relatively new technology, and how tolerant of errors and how willing to correct them one may initially be.

Accuracy may be seen as a drawback by some; even with 98 percent accuracy, one of every 50 words is misrecognized; therefore, edits are required to ensure accuracy of the final clinical document. Some physicians find microphone and program set-up time-consuming in a busy clinical situation. Lack of software support on consumer SRT products, inadequate computer and audio hardware to optimally utilize and operate SRT, licensing agreement issues, and institutional ROI, are some of the other potential barriers.

Future Challenges

As SRT and other human-computer interaction technologies continue to evolve, the input of complex narrative clinical data, complex multi-screen navigation that is common in most of today’s EHR products will hopefully become more simplified and user-friendly with the aid of improved user interface modalities. Simplicity, usability, and adaptability to a clinician’s workflow will likely determine future user satisfaction at the EHR point of data entry, which will hopefully influence long term overall EHR adoption rates.


As with all technology, SRT and the hardware that supports it have improved and will continue to improve, progressively making it a viable option to a more computer-literate generation of healthcare providers and HIM professionals. Technical, personal, and institutional barriers still exist, and more current research is needed to identify specific challenges so that improvements can be targeted in the future, but the trend is one of technological improvement, ease of use, and increased clinician adoption as a useful adjunct to EHR interaction. From a clinician’s standpoint, anything that will save time, make documentation and communication more robust and more immediate, and any tools that will diminish the awareness of computer interaction and complement work-flow and ultimately improve quality of patient care, time at the bedside and clinical outcomes will eventually be embraced, as we continue to strive for better and more seamless interaction with information systems in the future.


  1. Walsh, S. (2004). “The clinician’s perspective on electronic health records and how they can affect patient care”. BMJ; 328: 1184-7
  2. Beats, J. et al. (2003). “Speech Recognition in the Electronic Health Record (AHIMA Practice Brief)”. Web site: http://library.ahima.org/xpedio/groups/public/documents/ahima/bok1_022192.hcsp?dDocName=bok1_022192
  3. Gainer, C. “Voice Recognition: With Improved Technology, Efficiencies Are Clear.” Physicians Practice: The Business Journal for Physicans 13, no. 2 (2203):82-84.
  4. Ury, A. (2007). “Practices can speed productivity and reduce costs by adding speech recognition to their EHRs” . Healthcare Informatics, September 2007. Web site: http://www.healthcare-informatics.com/ME2/dirmod.asp?sid=&nm=&type=Publishing&mod=Publications%3A%3AArticle&mid=8F3A7027421841978F18BE895F87F791&tier=4&id=62287833C9004652BA62EA9CCE046DC8
  5. McGee, M. (2007). “Voice Recognition helps doctors get more out of e-health”. Information Week, September 2007. Web site:


  1. Glaser C. et al. (2005). “Speech recognition: impact on workflow and report availability”. Radiologue, Aug;45(8):735-4#
  2. Terry, Ken. (1999) “Instant Patient Records and All You Have to Do Is Talk.” Medical Economics 76, no.19: 101–102, 107–108, 111–112.

Submitted by Jose Gude