FAQ | SayToWords - AI Voice Transcription Platform

Speech-to-Text service is a technology that automatically converts speech content into text. It helps you quickly convert voice recordings, meeting minutes, and other audio content into editable text format.

Simply register an account, choose a plan that suits your needs, and you can start using our service. We provide a user-friendly interface that allows you to easily upload audio files and get conversion results.

We support various common audio formats, including MP3, WAV, M4A, AAC, and more. If you have special format requirements, please contact our customer service team.

Our speech recognition technology uses advanced AI algorithms and can achieve over 98% accuracy in standard Mandarin environments. For audio with accents or background noise, the accuracy may be lower.

We take user data security very seriously. Audio files are automatically deleted from our servers after conversion, and your voice data is stored on secure platforms with industry-leading encryption. You also have the option to manually delete audio files at any time.

The conversion time depends on the length of the audio file. Generally, the conversion time is about 10 seconds per minute of audio. For longer audio files, the conversion time may be longer. Longer audio files like 1 hour may take 10 minutes to convert.

We offer three transcription modes—Fastest, Balanced, and Accurate. For high-quality audio, the Fastest or Balanced mode is recommended because both deliver quick results with reliable accuracy. For general recordings, Balanced is the best all-around option. If your audio contains background noise, multiple speakers, or requires the highest precision, choose the Accurate mode.

Transcription files moved to the recycle bin will be kept for up to 30 days. They will be automatically and permanently deleted after the 30-day retention period. You can also choose to permanently delete them manually from the recycle bin at any time.

The speaker recognition feature is used to identify the speakers in the audio file. You can enable the speaker recognition feature by clicking the 'Enable Speaker Recognition' button. Once enabled, the speaker recognition feature will be used to identify the speakers in the audio file.

The “Recognize Speaker” option enables AI to identify and separate different speakers in your audio. If you specify the number of speakers, the AI can use this information to improve speaker separation and labeling accuracy. If you don’t select a number, the system will automatically detect and classify speakers for you. Please note that the final result may not strictly follow the number you choose, as the AI will still optimize speaker detection based on actual audio characteristics.

The “Scenario” option lets the system adjust technical parameters based on the specific environment of your audio. Different scenarios use different AI settings—such as noise reduction level, speech enhancement, and background filtering—to achieve better transcription accuracy without requiring you to manually configure complex options. In most cases, the “General” scenario offers the most balanced performance and is suitable for typical recordings.

“Segment Length” refers to the size or duration of each text segment generated during AI transcription. It controls how long each chunk of transcribed text will be. Shorter segments create more frequent breaks and finer timestamps, while longer segments produce larger blocks of text. This setting affects how the final transcript is structured, but it does not change the actual accuracy of the transcription.

Frequently Asked Questions