Files
duix.ai/duix-android/dh_aigc_android/README.md

405 lines
13 KiB
Markdown
Raw Normal View History

2025-10-22 17:50:21 +08:00
# Duix Mobile for Android SDK Documentation
2025-11-19 11:44:40 +08:00
English | [中文](./README_zh.md)
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
## 1. Product Overview
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
`Duix Mobile for Android` is a lightweight, fully offline 2D digital human solution for Android, supporting real-time rendering of digital avatars driven by voice audio.
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
### 1.1 Application Scenarios
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
- **Low deployment cost**: Suitable for unattended scenarios such as large-screen terminals, government halls, and banks.
- **Minimal network dependency**: Runs entirely locally, no internet required, stable operation in subways and remote areas.
- **Diverse functionality**: Can serve as a guide, Q&A customer service, intelligent companion, and more.
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
### 1.2 Core Features
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
- Customizable digital avatar and local rendering
- Real-time voice-driven playback (supports WAV playback and PCM streaming)
- Motion playback control (specific or random actions)
- Automatic resource download management
2025-07-17 17:13:20 +08:00
---
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
## 2. Terminology
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
| Term | Meaning |
|--------------------|----------------------------------------------------------------------------|
| PCM | Pulse-Code Modulation, raw audio stream with 16kHz sample rate, 16-bit depth, Mono channel |
| WAV | An audio file format that supports PCM encoding, suitable for short voice playback |
| RenderSink | Rendering data reception interface, implemented by the SDK, can be used for custom rendering or default display |
| DUIX | Main control object of the digital human, integrates model loading, rendering, broadcasting, and motion control |
| GLES | OpenGL ES, a graphics interface for rendering images on Android |
| SpecialAction | A JSON file attached to the model that marks action intervals (e.g., greetings, waving) |
2025-07-18 15:12:31 +08:00
---
2025-10-22 17:50:21 +08:00
## 3. SDK Access
2025-07-18 15:12:31 +08:00
2025-10-22 17:50:21 +08:00
### 3.1 Module Reference (Recommended)
2025-07-18 15:12:31 +08:00
2025-10-22 17:50:21 +08:00
1. Obtain the complete source package, unzip it, and copy the `duix-sdk` directory to the project root directory.
2. In the project `settings.gradle`, add:
2025-07-18 15:12:31 +08:00
```gradle
include ':duix-sdk'
```
2025-10-22 17:50:21 +08:00
3. In the module's `build.gradle`, add the dependency:
2025-07-18 15:12:31 +08:00
```gradle
dependencies {
api project(":duix-sdk")
}
```
2025-10-22 17:50:21 +08:00
### 3.2 AAR Reference (Optional)
2025-07-18 15:12:31 +08:00
2025-10-22 17:50:21 +08:00
1. Place the compiled `duix-sdk-release.aar` module into the `libs/` directory.
2. Add the dependency:
2025-07-18 15:12:31 +08:00
```gradle
dependencies {
api fileTree(include: ['*.jar', '*.aar'], dir: 'libs')
}
```
---
2025-10-22 17:50:21 +08:00
## 4. Integration Requirements
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
| Item | Description |
|----------------|-----------------------------------------------------------------|
| System | Supports Android 10+ systems. |
| CPU Architecture | armeabi-v7a, arm64-v8a |
| Hardware Requirements | Device CPU with 8 or more cores (Snapdragon 8 Gen 2), 8GB or more memory, available storage space of 1GB or more |
| Network | None (Fully local operation) |
| Development IDE | Android Studio Giraffe 2022.3.1 Patch 2 |
| Memory Requirements | Minimum 800MB memory available for the digital human |
2025-07-17 17:13:20 +08:00
---
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
## 5. Usage Flow Overview
2025-07-17 17:13:20 +08:00
```mermaid
graph TD
2025-10-22 17:50:21 +08:00
A[Check Configuration and Models] --> B[Build DUIX Instance]
B --> C[Call init to Initialize]
C --> D[Display Avatar / Render]
D --> E[PCM or WAV Audio Driving]
E --> F[Playback Control & Motion Triggering]
F --> G[Resource Release]
2025-07-17 17:13:20 +08:00
```
---
2025-10-22 17:50:21 +08:00
## 6. Key Interfaces and Example Calls
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
### 6.1 Model Check and Download
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
Before using the rendering service, ensure that the basic configuration and model files are synchronized to local storage. The SDK provides a simple demonstration of the model download and decompression process using `VirtualModelUtil`. If model download is slow or fails, developers can choose to cache the model package to their own storage service.
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
> Function Definition: `ai.guiji.duix.sdk.client.VirtualModelUtil`
2025-07-17 11:32:14 +08:00
```
2025-10-22 17:50:21 +08:00
// Check if base configuration is downloaded
2025-07-17 11:32:14 +08:00
boolean checkBaseConfig(Context context)
2025-10-22 17:50:21 +08:00
// Check if the model is downloaded
2025-07-17 11:32:14 +08:00
boolean checkModel(Context context, String name)
2025-10-22 17:50:21 +08:00
// Base configuration download
2025-07-17 11:32:14 +08:00
void baseConfigDownload(Context context, String url, ModelDownloadCallback callback)
2025-10-22 17:50:21 +08:00
// Model download
2025-07-17 11:32:14 +08:00
void modelDownload(Context context, String modelUrl, ModelDownloadCallback callback)
```
2025-10-22 17:50:21 +08:00
`ModelDownloadCallback` includes progress, completion, failure callbacks, etc., as defined in the SDK.
2025-07-17 11:32:14 +08:00
```
interface ModelDownloadCallback {
2025-10-22 17:50:21 +08:00
// Download progress
2025-07-17 11:32:14 +08:00
void onDownloadProgress(String url, long current, long total);
2025-10-22 17:50:21 +08:00
// Unzip progress
2025-07-17 11:32:14 +08:00
void onUnzipProgress(String url, long current, long total);
2025-10-22 17:50:21 +08:00
// Download and unzip complete
2025-07-17 11:32:14 +08:00
void onDownloadComplete(String url, File dir);
2025-10-22 17:50:21 +08:00
// Download and unzip failed
2025-07-17 11:32:14 +08:00
void onDownloadFail(String url, int code, String msg);
}
```
2025-10-22 17:50:21 +08:00
**Call Example**:
2024-05-16 17:22:43 +08:00
```kotlin
2025-07-17 11:32:14 +08:00
if (!VirtualModelUtil.checkBaseConfig(mContext)){
VirtualModelUtil.baseConfigDownload(mContext, baseConfigUrl, callback)
}
```
```kotlin
if (!VirtualModelUtil.checkModel(mContext, modelUrl)){
VirtualModelUtil.modelDownload(mContext, modelUrl, callback)
}
```
2025-07-17 17:13:20 +08:00
---
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
### 6.2 Initialization and Rendering Start
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
In the `onCreate()` stage of the rendering page, build the DUIX object and call the init interface.
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
> Function Definition: `ai.guiji.duix.sdk.client.DUIX`
2025-07-17 11:32:14 +08:00
```
2025-10-22 17:50:21 +08:00
// Build DUIX object
2025-07-17 11:32:14 +08:00
public DUIX(Context context, String modelName, RenderSink sink, Callback callback)
2025-10-22 17:50:21 +08:00
// Initialize DUIX service
2025-07-17 11:32:14 +08:00
void init()
```
2025-10-22 17:50:21 +08:00
**DUIX Object Construction Explanation**:
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
| Parameter | Type | Description |
|---------------|-----------|----------------------------------------------------------------|
| context | Context | System context |
| modelName | String | Can pass the model download URL (if downloaded) or cached filename |
| render | RenderSink| Rendering data interface, SDK provides a default rendering component inheriting from this interface, or you can implement it yourself |
| callback | Callback | Various callback events handled by the SDK |
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
Where **Callback** is defined as: `ai.guiji.duix.sdk.client.Callback`
2025-07-17 11:32:14 +08:00
```
interface Callback {
void onEvent(String event, String msg, Object info);
}
```
2025-10-22 17:50:21 +08:00
**Call Example**:
2025-07-17 11:32:14 +08:00
```kotlin
duix = DUIX(mContext, modelUrl, mDUIXRender) { event, msg, info ->
2024-05-16 17:22:43 +08:00
when (event) {
ai.guiji.duix.sdk.client.Constant.CALLBACK_EVENT_INIT_READY -> {
initOK()
}
ai.guiji.duix.sdk.client.Constant.CALLBACK_EVENT_INIT_ERROR -> {
2025-07-17 11:32:14 +08:00
initError()
2024-05-16 17:22:43 +08:00
}
// ...
}
}
2025-10-22 17:50:21 +08:00
// Asynchronous callback result
2024-05-16 17:22:43 +08:00
duix?.init()
```
2025-10-22 17:50:21 +08:00
In the `init` callback, confirm the initialization result.
2024-05-16 17:22:43 +08:00
2025-07-17 17:13:20 +08:00
---
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
### 6.3 Digital Human Avatar Display
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
Use the SDK-provided `DUIXRenderer` and `DUIXTextureView` to quickly implement rendering with transparency support. Alternatively, you can implement the `RenderSink` interface to customize the rendering logic.
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
The **RenderSink** definition is as follows: `ai.guiji.duix.sdk.client.render.RenderSink`
2024-05-16 17:22:43 +08:00
```java
/**
2025-10-22 17:50:21 +08:00
* Rendering pipeline, returns rendering data through this interface
2024-05-16 17:22:43 +08:00
*/
public interface RenderSink {
2025-10-22 17:50:21 +08:00
// The frame's buffer data is arranged in BGR order
2024-05-16 17:22:43 +08:00
void onVideoFrame(ImageFrame imageFrame);
}
```
2025-10-22 17:50:21 +08:00
**Call Example**:
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
Use `DUIXRenderer` and `DUIXTextureView` to quickly implement rendering. These controls support transparency and can freely set the background and foreground.
2024-05-16 17:22:43 +08:00
```kotlin
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
// ...
mDUIXRender =
DUIXRenderer(
mContext,
binding.glTextureView
)
binding.glTextureView.setEGLContextClientVersion(GL_CONTEXT_VERSION)
2025-10-22 17:50:21 +08:00
binding.glTextureView.setEGLConfigChooser(8, 8, 8, 8, 16, 0) // Transparency
binding.glTextureView.isOpaque = false // Transparency
2024-05-16 17:22:43 +08:00
binding.glTextureView.setRenderer(mDUIXRender)
binding.glTextureView.renderMode =
2025-10-22 17:50:21 +08:00
GLSurfaceView.RENDERMODE_WHEN_DIRTY // Must be called after setting the renderer
2024-05-16 17:22:43 +08:00
2025-07-17 11:32:14 +08:00
duix = DUIX(mContext, modelUrl, mDUIXRender) { event, msg, _ ->
2024-05-16 17:22:43 +08:00
}
// ...
}
```
2025-07-17 17:13:20 +08:00
---
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
### 6.4 Broadcasting Control
2025-07-18 15:12:31 +08:00
2025-10-22 17:50:21 +08:00
#### Use Streaming PCM to Drive Digital Human Broadcasting
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
**PCM Format: 16kHz sample rate, single channel, 16-bit depth**
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
> Function Definition: `ai.guiji.duix.sdk.client.DUIX`
2025-07-17 11:32:14 +08:00
```
2025-10-22 17:50:21 +08:00
// Notify service to start pushing audio
2025-07-17 11:32:14 +08:00
void startPush()
2025-10-22 17:50:21 +08:00
// Push PCM data
2025-07-17 11:32:14 +08:00
void pushPcm(byte[] buffer)
2025-10-22 17:50:21 +08:00
// Finish a segment of audio push (Call this after the audio push is complete, not after playback finishes)
2025-07-17 11:32:14 +08:00
void stopPush()
```
2025-10-22 17:50:21 +08:00
`startPush`, `pushPcm`, and `stopPush` need to be called in pairs. `pushPcm` should not be too long. After pushing the entire audio, call `stopPush` to end the session. Use `startPush` again for the next audio.
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
**The audio data between each startPush and stopPush segment should be at least 1 second (32000 bytes), otherwise the mouth shape driver cannot be triggered, and blank frames can be used to fill in.**
2025-10-22 17:50:21 +08:00
**Call Example**:
2024-05-16 17:22:43 +08:00
```kotlin
2025-07-17 11:32:14 +08:00
val thread = Thread {
duix?.startPush()
val inputStream = assets.open("pcm/2.pcm")
val buffer = ByteArray(320)
var length = 0
while (inputStream.read(buffer).also { length = it } > 0){
val data = buffer.copyOfRange(0, length)
duix?.pushPcm(data)
}
duix?.stopPush()
inputStream.close()
}
thread.start()
2024-05-16 17:22:43 +08:00
```
2025-07-17 17:13:20 +08:00
---
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
### 6.5 Motion Control
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
#### Play Specific Motion Interval
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
The model supports new motion intervals marked in `SpecialAction.json`
2025-07-17 11:32:14 +08:00
2025-10-22 17:50:21 +08:00
> Function Definition: `ai.guiji.duix.sdk.client.DUIX`
2024-05-16 17:22:43 +08:00
```
/**
2025-10-22 17:50:21 +08:00
* Play specific motion interval
* @param name The motion interval name, which can be obtained from @{ModelInfo.getSilenceRegion()} after init callback
* @param now Whether to play immediately: true: play now; false: wait for current silent or motion interval to finish
*/
2025-07-17 11:32:14 +08:00
void startMotion(String name, boolean now)
2024-05-16 17:22:43 +08:00
```
2025-10-22 17:50:21 +08:00
**Call Example**:
2024-05-16 17:22:43 +08:00
```kotlin
2025-10-22 17:50:21 +08:00
duix?.startMotion("Greeting", true)
2024-05-16 17:22:43 +08:00
```
2025-10-22 17:50:21 +08:00
#### Randomly Play Motion Interval
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
> Function Definition: `ai.guiji.duix.sdk.client.DUIX`
```
/**
2025-10-22 17:50:21 +08:00
* Randomly play a motion interval
* @param now Whether to play immediately: true: play now; false: wait for current silent or motion interval to finish
*/
2025-07-17 11:32:14 +08:00
void startRandomMotion(boolean now);
```
2025-10-22 17:50:21 +08:00
**Call Example**:
```kotlin
2025-07-17 11:32:14 +08:00
duix?.startRandomMotion(true)
```
2025-07-17 17:13:20 +08:00
---
2025-10-22 17:50:21 +08:00
## 7. Proguard Configuration
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
If using obfuscation, add the following in `proguard-rules.pro`:
2024-05-16 17:22:43 +08:00
2025-07-17 17:13:20 +08:00
```proguard
2025-07-17 11:32:14 +08:00
-keep class ai.guiji.duix.DuixNcnn{*; }
2024-05-16 17:22:43 +08:00
```
2025-07-17 17:13:20 +08:00
---
2025-07-18 15:12:31 +08:00
2025-10-22 17:50:21 +08:00
## 8. Precautions
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
1. Ensure that the base configuration file and model are downloaded to the specified location before driving rendering initialization.
2. PCM audio should not be too long, as PCM buffers are cached in memory; long audio streams may cause memory overflow.
3. To replace the preview model, modify the `modelUrl` value in `MainActivity.kt` and use the SDK's built-in file download and decompression management to obtain the complete model files.
4. Audio driving format: 16kHz sample rate, single channel, 16-bit depth.
5. Insufficient device performance may result in the audio feature extraction speed not matching the playback speed. You can use `duix?.setReporter()` to monitor frame rendering information.
2024-05-16 17:22:43 +08:00
2025-07-17 17:13:20 +08:00
---
2025-10-22 17:50:21 +08:00
## 9. FAQ and Troubleshooting Guide
2025-07-18 15:12:31 +08:00
2025-10-22 17:50:21 +08:00
| Issue | Possible Cause | Solution |
|---------------------------------|------------------------------|------------------------------|
| init callback failed | Model path error or model not downloaded | Use `checkModel` to check model status |
| Rendering black screen | EGL configuration or texture view error | Use SDK-provided example settings |
| No PCM playback effect | Incorrect format or `startPush` not called | Ensure audio format is correct and call push method |
| Model download slow | Unstable network or restricted CDN | Support self-hosted model file storage service |
2025-07-18 15:12:31 +08:00
---
2025-10-22 17:50:21 +08:00
## 10. Version History
2024-05-16 17:22:43 +08:00
2025-07-17 11:32:14 +08:00
**<a>4.0.1</a>**
2025-07-18 15:12:31 +08:00
2025-10-22 17:50:21 +08:00
1. Supports PCM audio stream driving the digital human, improving audio playback response speed.
2. Optimized motion interval playback, allowing specific motion intervals based on model configuration.
3. Custom audio player, removed Exoplayer playback dependency.
4. Provided simplified model download synchronization management tools.
5. The audio data between each startPush and stopPush segment should be at least 1 second (32000 bytes), otherwise the mouth shape driver cannot be triggered, and blank frames can be used to fill in.
2025-07-17 11:32:14 +08:00
**<a>3.0.5</a>**
```text
2025-10-22 17:50:21 +08:00
1. Updated arm32 CPU libonnxruntime.so version to fix compatibility issues.
2. Modified motion interval playback function, supports random and sequential playback, requires manual call to stop playback to return to silent interval.
```
2024-05-16 17:22:43 +08:00
**<a>3.0.4</a>**
```text
2025-10-22 17:50:21 +08:00
1. Fixed model display issue due to low float precision on some devices.
2024-05-16 17:22:43 +08:00
```
**<a>3.0.3</a>**
```text
2025-10-22 17:50:21 +08:00
1. Optimized local rendering.
2024-05-16 17:22:43 +08:00
```
2025-10-22 17:50:21 +08:00
## 11. 🔗 Open-source Dependencies
2024-05-16 17:22:43 +08:00
2025-10-22 17:50:21 +08:00
| Module | Description |
|------------------------------------------|--------------------------------|
| [onnx](https://github.com/onnx/onnx) | General AI model standard format |
| [ncnn](https://github.com/Tencent/ncnn) | High-performance neural network computing framework (Tencent) |
2025-07-18 15:12:31 +08:00
---
2025-10-22 17:50:21 +08:00
For more help, please contact the technical support team.