Preview |
PDF, English
- main document
Download (2MB) | Terms of use |
Abstract
Speech-to-text technologies, driven by advancements in artificial intelligence, are increasingly beneficial to sectors like research and education. These systems enable the transcription of vast audio data, making it easier to process, analyze, and archive information. However, concerns over data privacy and reliance on cloud-based services have prompted the need for self-hosted solutions. Project-W addresses these issues by providing a private, AI-driven transcription platform based on OpenAI's Whisper general-purpose speech recognition model.
The main goal of Project-W is to offer an easy open-source, scalable transcription solution that ensures data privacy by running entirely on local infrastructure. Specifically, it is designed for environments like universities and research institutions that handle sensitive information. By eliminating the need for cloud services, Project-W safeguards data while leveraging powerful AI models for accurate transcription. It aims to simplify transcription workflows, enabling users to manage their audio processing needs efficiently and securely.
Project-W is built with a Flask-based backend, a Svelte-powered frontend, and Python runners. The backend handles transcription tasks, while the frontend provides an intuitive interface for users to submit, track, and retrieve jobs. Python runners manage the interaction with OpenAI's Whisper AI model, and all components communicate via an HTTP REST API. The platform supports deployment on high-performance hardware, optimizing the processing of large and complex models. Key features include local data storage, user-friendly job management, and scalable infrastructure to handle varying workloads, making it adaptable to diverse environments.
Preliminary testing of Project-W in a university setting demonstrates that the platform is capable of handling significant transcription workloads while maintaining high levels of data security. Its modular architecture allows for customization based on user requirements, such as integrating with institutional servers or enhancing hardware capabilities to improve transcription speed. The platform’s user-friendly web interface streamlines job management, ensuring that even non-technical users can effectively utilize the tool.
Ongoing work focuses on optimizing the platform's performance for large-scale use while actively gathering feedback from both users and administrators to improve functionality and user experience. Further evaluations will be conducted to assess its viability as a central transcription service across other departments, with a view toward broad institutional adoption.
Document type: | Conference Item |
---|---|
Place of Publication: | Heidelberg |
Date Deposited: | 17 Apr 2025 09:37 |
Date: | 2025 |
Event Dates: | 12.03.2025 - 14.03.2025 |
Event Location: | Universität Heidelberg |
Event Title: | E-Science-Tage 2025 |
Faculties / Institutes: | Service facilities > Computing Centre |
Collection: | E-Science-Tage 2025 |