Explanation of the Enclosed document
In alignment with the principles established in ACB Resolution 2021-22, which affirms the organization's support for the use of human voices in audio description, the following document entitled "Guidelines and Best Practices for the Use of Text to Speech (TTS) in Audio Description is proposed for adoption by the membership of the American Council of the Blind and will be introduced as a motion on Wednesday June 24. This document provides updated, actionable guidance to ensure that when TTS is used, it meets high standards of quality, transparency, and inclusion, and that it is never employed in ways that diminish accessibility or audience experience. The standards herein are intended to supplement the 2021 resolution by addressing current technological trends while upholding ACB’s core values.
American Council of the Blind Audio Description Project
Explanation of the Enclosed document
In alignment with the principles established in ACB Resolution 2021-22, which affirms the organization's support for the use of human voices in audio description, the following document entitled "Guidelines and Best Practices for the Use of Text to Speech (TTS) in Audio Description is proposed for adoption by the membership of the American Council of the Blind and will be introduced as a motion on Wednesday June 24. This document provides updated, actionable guidance to ensure that when TTS is used, it meets high standards of quality, transparency, and inclusion, and that it is never employed in ways that diminish accessibility or audience experience. The standards herein are intended to supplement the 2021 resolution by addressing current technological trends while upholding ACB’s core values.
American Council of the Blind Audio Description Project
Guidelines and Best Practices for the Use of Text-to-Speech (TTS) in Audio Description
June 2025
Submitted by: ACB Audio Description project Steering committee and endorsed by the ACB Advocacy Steering committee
Introduction
Audio Description (AD) provides blind and low vision individuals with vital access to visual media. Traditionally, human-voiced narration has set the gold standard for AD, offering expressiveness, emotional nuance, and clarity. Thus, the American Council of the Blind supports the use of human-voiced audio description as the preferred mode of its provision for consumers. (Resolution 2021-22). As advances in Text-to-Speech (TTS) and AI-generated voices become more prevalent, it is critical to ensure that their use maintains and never diminishes the quality, dignity, and accessibility of Audio Description. This document outlines the American Council of the Blind’s (ACB) Guidelines and Best Practices for the responsible use of TTS in Audio Description.
Core Principles
• Equity of Access: Blind and low vision individuals must receive a media experience that is as equivalent as possible in richness, clarity, and engagement to that of sighted audiences.
• Human Centric Quality: Human voiced Audio Description remains the gold standard.
• Responsible Technology Use: Where TTS is used, it must meet or approximate the quality standards of human narration.
• Transparency: Audiences must be informed when TTS is used.
• Inclusion: Blind and low vision individuals must participate in and be a part of the evaluation and quality control of AD projects, including AD scripts.
Guidelines for TTS in Audio Description
• Voice Quality: Natural, human like intonation. Avoid robotic or mechanical delivery. Emotional tone should match the context of the scene. Ensure clear articulation and consistent pronunciation, especially for names, technical terms, and cultural references.
• Timing and Pacing: Proper synchronization with on-screen action. Use natural pausing and breathing patterns to support listener comprehension.
• Pronunciation and Clarity: Accurate pronunciation of names, places, idioms, and foreign terms. Respect culturally specific pronunciation and dialects.
• Emotional Engagement: Convey appropriate emotional tone urgency, tension, tenderness, excitement, through vocal modulation and pitch.
• Audio and Sound Quality: High Fidelity Audio Output: Minimum 48kHz/24-bit preferred for narration tracks.
• Mixing and Balance: Ensure clear, audible narration. Use techniques such as audio ducking when appropriate to prevent program audio from overpowering the description.
• Consistent Volume Levels: Comply with loudness standards (e.g., EBU R128, ATSC A/85).
• No Audio Artifacts: Audio must be free from distortion, glitches, hiss, or dropouts.
• Stereo/Surround Compatibility: Narration must sound correct on stereo and surround sound systems.
• Ambient Sound Respect: Retain environmental sounds essential to storytelling when possible.
Best Practices for Implementation
• Prioritize Human Voice: TTS should only be used when human-voiced narration is not feasible due to logistical or production constraints, and never purely as a cost saving measure.
• Invest in High-Quality TTS: Choose TTS engines designed for expressiveness, emotional nuance, and accessibility. Avoid generic or monotone systems.
• Rigorous Quality Assurance: All TTS-generated AD must be reviewed by human experts, including blind and low vision professionals. Evaluation must cover both content and technical quality.
• Sound Quality Testing: Conduct listening tests on both professional equipment and consumer devices. Ensure consistent audio quality and proper sound mixing in various listening environments.
• Audience Notification: Clearly inform audiences when TTS is used (e.g., through accessibility settings, credits, or metadata). When feasible, offer a human-narrated version — though we recognize this may not always be practical.
Ethical Considerations
• Respect for the Audience: Accessibility should never be an afterthought or a budget-based compromise.
• Quality Over Cost: The decision to use TTS must prioritize the quality of the audience experience.
• Community Involvement: Blind and low vision individuals must be integral to the development, testing, and approval of TTS-based Audio Description.
Common Audio Quality Failures to Avoid
• Voice and Narration Failures:
o Robotic or mechanical sounding voices;
o Monotone or emotionally flat delivery;
o Mispronunciations; Rushed, lagging, or unnatural pacing
• Audio Recording and Mixing Failures:
o Low-fidelity, muffled, or over-compressed narration;
o Audio clipping, hiss, or distortion;
o Overpowering or too-soft narration relative to program audio;
o Poor integration with dialogue, music, or sound effects;
o Digital artifacts or audio dropouts
• Accessibility Failures:
o Incorrect audio channel mapping;
o Inconsistent quality across segments or episodes;
o Implementation Tools
Studio / Provider Checklist:
• Prioritize human narration where possible. Choose expressive, high quality TTS voices.
• Ensure human review of content and technical quality.
• Clearly notify audiences of TTS use.
• Provide feedback channels for blind and low vision viewers.
• Commit to continuous improvement based on audience input.
Conclusion
Synthetic narration technologies are rapidly evolving. Yet the purpose of Audio Description remains unchanged: to provide blind and low vision individuals with media experiences that are equal in emotional depth, quality, and engagement to those of sighted audiences.
The American Council of the Blind urges all media creators, streaming platforms, and content producers to adopt these guidelines, uphold excellence, and advance equity and inclusion in every accessible media offering.
June 2025
Submitted by: ACB Audio Description project Steering committee and endorsed by the ACB Advocacy Steering committee
Introduction
Audio Description (AD) provides blind and low vision individuals with vital access to visual media. Traditionally, human-voiced narration has set the gold standard for AD, offering expressiveness, emotional nuance, and clarity. Thus, the American Council of the Blind supports the use of human-voiced audio description as the preferred mode of its provision for consumers. (Resolution 2021-22). As advances in Text-to-Speech (TTS) and AI-generated voices become more prevalent, it is critical to ensure that their use maintains and never diminishes the quality, dignity, and accessibility of Audio Description. This document outlines the American Council of the Blind’s (ACB) Guidelines and Best Practices for the responsible use of TTS in Audio Description.
Core Principles
· Equity of Access: Blind and low vision individuals must receive a media experience that is as equivalent as possible in richness, clarity, and engagement to that of sighted audiences.
· Human Centric Quality: Human voiced Audio Description remains the gold standard.
· Responsible Technology Use: Where TTS is used, it must meet or approximate the quality standards of human narration.
· Transparency: Audiences must be informed when TTS is used.
· Inclusion: Blind and low vision individuals must participate in and be a part of the evaluation and quality control of AD projects, including AD scripts.
Guidelines for TTS in Audio Description
· Voice Quality: Natural, human like intonation. Avoid robotic or mechanical delivery. Emotional tone should match the context of the scene. Ensure clear articulation and consistent pronunciation, especially for names, technical terms, and cultural references.
· Timing and Pacing: Proper synchronization with on-screen action. Use natural pausing and breathing patterns to support listener comprehension.
· Pronunciation and Clarity: Accurate pronunciation of names, places, idioms, and foreign terms. Respect culturally specific pronunciation and dialects.
· Emotional Engagement: Convey appropriate emotional tone urgency, tension, tenderness, excitement, through vocal modulation and pitch.
· Audio and Sound Quality: High Fidelity Audio Output: Minimum 48kHz/24-bit preferred for narration tracks.
· Mixing and Balance: Ensure clear, audible narration. Use techniques such as audio ducking when appropriate to prevent program audio from overpowering the description.
· Consistent Volume Levels: Comply with loudness standards (e.g., EBU R128, ATSC A/85).
· No Audio Artifacts: Audio must be free from distortion, glitches, hiss, or dropouts.
· Stereo/Surround Compatibility: Narration must sound correct on stereo and surround sound systems.
· Ambient Sound Respect: Retain environmental sounds essential to storytelling when possible.
Best Practices for Implementation
· Prioritize Human Voice: TTS should only be used when human-voiced narration is not feasible due to logistical or production constraints, and never purely as a cost saving measure.
· Invest in High-Quality TTS: Choose TTS engines designed for expressiveness, emotional nuance, and accessibility. Avoid generic or monotone systems.
· Rigorous Quality Assurance: All TTS-generated AD must be reviewed by human experts, including blind and low vision professionals. Evaluation must cover both content and technical quality.
· Sound Quality Testing: Conduct listening tests on both professional equipment and consumer devices. Ensure consistent audio quality and proper sound mixing in various listening environments.
· Audience Notification: Clearly inform audiences when TTS is used (e.g., through accessibility settings, credits, or metadata). When feasible, offer a human-narrated version — though we recognize this may not always be practical.
Ethical Considerations
· Respect for the Audience: Accessibility should never be an afterthought or a budget-based compromise.
· Quality Over Cost: The decision to use TTS must prioritize the quality of the audience experience.
· Community Involvement: Blind and low vision individuals must be integral to the development, testing, and approval of TTS-based Audio Description.
Common Audio Quality Failures to Avoid
· Voice and Narration Failures:
o Robotic or mechanical sounding voices;
o Monotone or emotionally flat delivery;
o Mispronunciations; Rushed, lagging, or unnatural pacing
· Audio Recording and Mixing Failures:
o Low-fidelity, muffled, or over-compressed narration;
o Audio clipping, hiss, or distortion;
o Overpowering or too-soft narration relative to program audio;
o Poor integration with dialogue, music, or sound effects;
o Digital artifacts or audio dropouts
· Accessibility Failures:
o Incorrect audio channel mapping;
o Inconsistent quality across segments or episodes;
o Implementation Tools
Studio / Provider Checklist:
· Prioritize human narration where possible. Choose expressive, high quality TTS voices.
· Ensure human review of content and technical quality.
· Clearly notify audiences of TTS use.
· Provide feedback channels for blind and low vision viewers.
· Commit to continuous improvement based on audience input.
Conclusion
Synthetic narration technologies are rapidly evolving. Yet the purpose of Audio Description remains unchanged: to provide blind and low vision individuals with media experiences that are equal in emotional depth, quality, and engagement to those of sighted audiences.
The American Council of the Blind urges all media creators, streaming platforms, and content producers to adopt these guidelines, uphold excellence, and advance equity and inclusion in every accessible media offering.
Find out more at https://acb-business.pinecast.co