使用 Python 和 Django 进行实时转录

沈建基 · 发表于 2022-3-9 16:30:02

你有没有想过如何使用Python进行实时语音到文本转录？我们将使用Django和Deepgram来实现我们在本文中的目标。

Django是一个熟悉的Python Web框架，用于快速开发。它提供了许多我们需要"开箱即用"的东西，并且所有内容都包含在框架中，遵循"包含电池"的理念。Deepgram使用AI语音识别来执行实时音频转录，我们将使用我们的Python SDK。

这个项目的最终代码在Github中，如果你想跳下去的话。

开始

在我们开始之前，必须生成一个Deepgram API密钥以在我们的项目中使用。我们可以去这里。在本教程中，我们将使用Python 3.10，但Deepgram也支持一些早期版本的Python。我们还将使用Django 4.0版本和Django Channels来处理WebSockets。我们需要设置一个虚拟环境来保存我们的项目。我们可以在这里阅读更多关于这些以及如何创建一个。

安装依赖项

创建一个文件夹目录来存储我们所有的项目文件，并在其中创建一个虚拟环境。确保我们的虚拟环境已激活，如上一节中的文章所述。确保所有依赖项都安装在该环境中。

为了快速参考，以下是创建和激活虚拟环境所需的命令：

mkdir [% NAME_OF_YOUR_DIRECTORY %]cd [% NAME_OF_YOUR_DIRECTORY %]python3 -m venv venvsource venv/bin/activate

我们需要从终端安装以下依赖项：

Django的最新版本
The Deepgram Python SDK
dotenv 库，它帮助我们使用环境变量
最新版本的 Django 频道

pip install Djangopip install deepgram-sdkpip install python-dotenvpip install channels

创建一个 Django 项目

让我们通过从终端运行此命令来创建一个 Django 项目。django-admin startproject stream

我们的项目目录现在将如下所示：

创建一个 Django 应用程序

我们需要将应用程序的服务器部分的代码保存在名为 .让我们确保我们在项目中使用 .我们需要通过执行以下操作将目录更改为流项目中：transcriptmanage.py

cd streampython3 manage.py startapp transcript

我们将在与项目相同的目录级别看到新应用。transcript

我们还需要告诉我们的项目我们正在使用这个新的应用程序。为此，请转到文件内的文件夹，然后将应用程序添加到。transcriptstreamsettings.pyINSTALLED_APPS

Create Index View

Let’s get a starter Django application up and running that renders an HTML page so that we can progress on our live transcription project.

Create a folder called inside our app. Inside the templates folder, create an file inside another directory called .templatestranscriptindex.htmltranscript

在我们里面添加以下HTML标记：transcript/templates/transcript/index.html

<!DOCTYPE html><html> <head> <title>Live Transcription</title> </head> <body> <h1>Transcribe Audio With Django</h1> <p id="status">Connection status will go here</p> <p id="transcript"></p> </body></html>

然后将以下代码添加到我们的和应用程序中。views.pytranscript

from django.shortcuts import renderdef index(request): return render(request, 'transcript/index.html')

我们需要在应用程序内部创建一个来调用我们的视图。urls.pytranscript

让我们将以下代码添加到新文件中：urls.py

from django.urls import pathfrom . import viewsurlpatterns = [ path('', views.index, name='index'),]

我们必须将此文件指向模块。在添加代码中：transcript.urlsstream/urls.pystream/urls.py

from django.conf.urls import includefrom django.contrib import adminfrom django.urls import pathurlpatterns = [ path('', include('transcript.urls')), path('admin/', admin.site.urls),]

如果我们从终端启动开发服务器以使用运行项目，则当我们导航到本地主机时，页面将在浏览器中呈现。python3 manage.py runserverindex.htmlhttp://127.0.0.1:8000

集成 Django 通道

我们需要向文件添加代码。stream/asgi.py

import osfrom channels.auth import AuthMiddlewareStackfrom channels.routing import ProtocolTypeRouter, URLRouterfrom django.core.asgi import get_asgi_applicationimport transcript.routingos.environ.setdefault("DJANGO_SETTINGS_MODULE", "stream.settings")application = ProtocolTypeRouter({ "http": get_asgi_application(), "websocket": AuthMiddlewareStack( URLRouter( transcript.routing.websocket_urlpatterns ) ),})

现在，我们必须将通道库添加到我们的文件中INSTALLED_APPSsettings.pystream/settings.py

We also need to add the following line to our at the bottom of the file:stream/settings.py

ASGI_APPLICATION = 'stream.asgi.application'

To ensure everything is working correctly with Channels, run the development server . We should see the output in our terminal like the following:python3 manage.py runserver

Add Deepgram API Key

Our API Key will allow access to use Deepgram. Let’s create a file that will store our key. When we push our code to Github, hide our key, make sure to add this to our file..env.gitignore

In our file, add the following environment variable with our Deepgram API key, which we can grab here:

DEEPGRAM_API_KEY="abcde12345"

Next, create a file inside our app, acting as our server.consumers.pytranscript

Let’s add this code to our . This code loads our key into the project and accesses it in our application:consumers.py

from channels.generic.websocket import AsyncWebsocketConsumerfrom dotenv import load_dotenvfrom deepgram import Deepgramimport osload_dotenv()class TranscriptConsumer(AsyncWebsocketConsumer): dg_client = Deepgram(os.getenv('DEEPGRAM_API_KEY'))

We also have to add a file inside our app. This file will direct channels to run the correct code when we make an HTTP request intercepted by the Channels server. routing.pytranscript

from django.urls import re_pathfrom . import consumerswebsocket_urlpatterns = [ re_path(r'listen', consumers.TranscriptConsumer.as_asgi()),]

Get Mic Data From Browser

Our next step is to get the microphone data from the browser, which will require a little JavaScript.

Use this code inside the tag in to access data from the user’s microphone.<script></script>index.html

If you want to learn more about working with the mic in the browser, please check out this post.

服务器和浏览器之间的网络插座连接

我们需要在我们的项目中使用WebSockets。我们可以将WebSockets视为服务器和客户端之间的连接，该连接保持打开状态并允许来回发送连续消息。

第一个WebSocket连接是在我们的Python服务器之间，它保存了我们的Django应用程序和我们的浏览器客户端。在这个项目中，我们将使用 Django 通道来处理 WebSocket 服务器。

我们需要创建一个 WebSocket 端点，用于侦听我们的 Django Web 服务器代码以进行客户端连接。在上一节中的文件中，完成此连接。consumers.pyre_path(r'listen', consumers.TranscriptConsumer.as_asgi())

class TranscriptConsumer(AsyncWebsocketConsumer): dg_client = Deepgram(os.getenv('DEEPGRAM_API_KEY')) async def connect(self): await self.connect_to_deepgram() await self.accept() async def receive(self, bytes_data): self.socket.send(bytes_data)

上面的代码接受服务器和客户端之间的 WebSocket 连接。只要连接保持打开状态，我们就会收到字节并等待，直到从客户端收到消息。当服务器和浏览器连接保持打开状态时，我们将等待消息并发送数据。

在中，此代码侦听客户端连接，然后连接到客户端，如下所示：index.html

我们需要在中央Django服务器和Deepgram之间建立连接，以获得音频和实时转录。将此代码添加到我们的文件中。consumers.py

from typing import Dictclass TranscriptConsumer(AsyncWebsocketConsumer): dg_client = Deepgram(os.getenv('DEEPGRAM_API_KEY')) async def get_transcript(self, data: Dict) -> None: if 'channel' in data: transcript = data['channel']['alternatives'][0]['transcript'] if transcript: await self.send(transcript) async def connect_to_deepgram(self): try: self.socket = await self.dg_client.transcription.live({'punctuate': True, 'interim_results': False}) self.socket.registerHandler(self.socket.event.CLOSE, lambda c: print(f'Connection closed with code {c}.')) self.socket.registerHandler(self.socket.event.TRANSCRIPT_RECEIVED, self.get_transcript) except Exception as e: raise Exception(f'Could not open socket: {e}') async def connect(self): await self.connect_to_deepgram() await self.accept() async def disconnect(self, close_code): await self.channel_layer.group_discard( self.room_group_name, self.channel_name ) async def receive(self, bytes_data): self.socket.send(bytes_data)

该函数将我们连接到Deepgram，并创建与deepgram的套接字连接，侦听连接是否关闭，并获取传入的转录对象。该方法从 Deepgram 获取脚本并将其发送回客户端。connect_to_deepgramget_transcript

最后，在我们的中，我们需要接收和获取具有以下事件的数据。请注意，他们正在登录到我们的控制台。如果您想了解有关这些事件的作用的更多信息，请查看此博客文章。index.html

让我们启动应用程序并开始获取实时转录。从我们的终端，运行并在端口8000上拉起我们的本地主机。如果尚未访问，请允许访问我们的麦克风。开始发言，我们应该看到如下文字记录：python3 manage.py runserverhttp://127.0.0.1:8000/

恭喜你使用 Django 和 Deepgram 构建了一个实时转录项目

		自动登录	找回密码
密码			立即注册

使用 Python 和 Django 进行实时转录

本帖子中包含更多资源