2025-06-26 13:47:54 +08:00
|
|
|
|
# DeepCam - VisionProcessor C++ Library
|
2025-06-03 10:29:28 +08:00
|
|
|
|
|
2025-06-26 13:47:54 +08:00
|
|
|
|
这个项目将Qwen多模态大模型的视觉预处理Python代码转换为高性能的C++实现。VisionProcessor库提供了完整的图像和视频预处理功能,以及Qwen2VL处理器,专为多模态AI应用设计。
|
|
|
|
|
|
|
|
|
|
|
|
## 🎯 特性
|
|
|
|
|
|
|
|
|
|
|
|
- **🖼️ 图像处理**: 支持多种图像格式(JPG, PNG, BMP, TIFF, WebP等)
|
|
|
|
|
|
- **🎬 视频处理**: 支持主流视频格式(MP4, AVI, MOV, MKV等)
|
|
|
|
|
|
- **🧠 智能调整大小**: 基于像素约束和纵横比的智能调整算法
|
|
|
|
|
|
- **🌐 多种输入源**: 支持本地文件、HTTP/HTTPS URL和Base64编码
|
|
|
|
|
|
- **⚡ 高性能**: 使用OpenCV优化的图像处理算法
|
|
|
|
|
|
- **🎛️ 灵活配置**: 丰富的参数配置选项
|
|
|
|
|
|
- **🔒 内存安全**: RAII和现代C++内存管理
|
|
|
|
|
|
- **📹 智能帧采样**: 高效的视频帧采样算法
|
|
|
|
|
|
- **🤖 Qwen2VL处理器**: 完整的多模态处理器,支持图像、视频和文本的联合处理
|
|
|
|
|
|
|
|
|
|
|
|
## 📁 项目结构
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
project-root/ # 项目根目录
|
|
|
|
|
|
├── CMakeLists.txt # 主CMake配置文件
|
|
|
|
|
|
├── README.md # 项目说明文档
|
|
|
|
|
|
├── deepcam/ # DeepCam源代码目录
|
|
|
|
|
|
│ └── sources/ # 源代码子目录
|
|
|
|
|
|
│ ├── CMakeLists.txt # 库CMake配置
|
|
|
|
|
|
│ ├── vision_process.hpp # 视觉处理头文件
|
|
|
|
|
|
│ ├── vision_process.cc # 视觉处理实现文件
|
|
|
|
|
|
│ ├── qwen2_vl_processor.hpp # Qwen2VL处理器头文件
|
|
|
|
|
|
│ └── qwen2_vl_processor.cc # Qwen2VL处理器实现文件
|
|
|
|
|
|
└── test/ # 测试和示例目录
|
|
|
|
|
|
├── CMakeLists.txt # 测试CMake配置
|
|
|
|
|
|
├── vision_process_example.cpp # 视觉处理使用示例
|
|
|
|
|
|
└── qwen2_vl_example.cpp # Qwen2VL处理器使用示例
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 🛠️ 依赖
|
|
|
|
|
|
|
|
|
|
|
|
### 必需依赖
|
|
|
|
|
|
- **OpenCV 4.x**: 图像和视频处理 (`libopencv-dev`)
|
|
|
|
|
|
- **libcurl**: HTTP请求支持 (`libcurl4-openssl-dev`)
|
|
|
|
|
|
- **CMake 3.10+**: 构建系统
|
|
|
|
|
|
- **C++17**: 现代C++标准支持
|
|
|
|
|
|
|
|
|
|
|
|
### 可选依赖
|
|
|
|
|
|
- **jsoncpp**: JSON配置文件解析 (`libjsoncpp-dev`) - AutoProcessor功能需要
|
|
|
|
|
|
|
|
|
|
|
|
### 系统要求
|
|
|
|
|
|
- **Linux**: Ubuntu 18.04+ / CentOS 7+ / 其他主流发行版
|
|
|
|
|
|
- **macOS**: 10.14+ (支持Homebrew)
|
|
|
|
|
|
- **编译器**: GCC 7+ / Clang 6+ / MSVC 2017+
|
|
|
|
|
|
|
|
|
|
|
|
## 📦 安装依赖
|
|
|
|
|
|
|
|
|
|
|
|
### Ubuntu/Debian
|
|
|
|
|
|
```bash
|
|
|
|
|
|
sudo apt-get update
|
|
|
|
|
|
sudo apt-get install libopencv-dev libcurl4-openssl-dev cmake build-essential pkg-config
|
|
|
|
|
|
|
|
|
|
|
|
# 可选:安装jsoncpp以支持AutoProcessor
|
|
|
|
|
|
sudo apt-get install libjsoncpp-dev
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### macOS (使用Homebrew)
|
|
|
|
|
|
```bash
|
|
|
|
|
|
brew install opencv curl cmake pkg-config
|
|
|
|
|
|
|
|
|
|
|
|
# 可选:安装jsoncpp以支持AutoProcessor
|
|
|
|
|
|
brew install jsoncpp
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### CentOS/RHEL
|
|
|
|
|
|
```bash
|
|
|
|
|
|
sudo yum install opencv-devel libcurl-devel cmake3 gcc-c++ pkgconfig
|
|
|
|
|
|
|
|
|
|
|
|
# 可选:安装jsoncpp以支持AutoProcessor (EPEL仓库)
|
|
|
|
|
|
sudo yum install epel-release
|
|
|
|
|
|
sudo yum install jsoncpp-devel
|
|
|
|
|
|
|
|
|
|
|
|
# 或者在较新版本上使用 dnf
|
|
|
|
|
|
sudo dnf install opencv-devel libcurl-devel cmake gcc-c++ pkgconf-pkg-config jsoncpp-devel
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 🔨 编译
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 进入项目根目录
|
|
|
|
|
|
cd project-root
|
|
|
|
|
|
|
|
|
|
|
|
# 创建构建目录
|
|
|
|
|
|
mkdir build && cd build
|
|
|
|
|
|
|
|
|
|
|
|
# 配置CMake
|
|
|
|
|
|
cmake ..
|
|
|
|
|
|
|
|
|
|
|
|
# 编译
|
|
|
|
|
|
make -j$(nproc)
|
|
|
|
|
|
|
|
|
|
|
|
# 可选:安装到系统
|
|
|
|
|
|
sudo make install
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 编译选项
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Debug模式编译
|
|
|
|
|
|
cmake -DCMAKE_BUILD_TYPE=Debug ..
|
|
|
|
|
|
|
|
|
|
|
|
# Release模式编译(默认)
|
|
|
|
|
|
cmake -DCMAKE_BUILD_TYPE=Release ..
|
|
|
|
|
|
|
|
|
|
|
|
# 指定OpenCV路径(如果需要)
|
|
|
|
|
|
cmake -DOpenCV_DIR=/path/to/opencv ..
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 🚀 快速开始
|
|
|
|
|
|
|
|
|
|
|
|
### 基本用法
|
|
|
|
|
|
|
|
|
|
|
|
```cpp
|
|
|
|
|
|
#include "vision_process.hpp"
|
|
|
|
|
|
|
|
|
|
|
|
int main() {
|
|
|
|
|
|
// 创建处理器实例
|
|
|
|
|
|
VisionProcessor processor;
|
|
|
|
|
|
|
|
|
|
|
|
// 处理单张图像
|
|
|
|
|
|
std::map<std::string, std::string> imageConfig;
|
|
|
|
|
|
imageConfig["image"] = "/path/to/image.jpg";
|
|
|
|
|
|
imageConfig["min_pixels"] = "3136";
|
|
|
|
|
|
imageConfig["max_pixels"] = "12845056";
|
|
|
|
|
|
|
|
|
|
|
|
try {
|
|
|
|
|
|
cv::Mat processedImage = processor.fetchImage(imageConfig);
|
|
|
|
|
|
std::cout << "图像处理成功!尺寸: "
|
|
|
|
|
|
<< processedImage.cols << "x" << processedImage.rows << std::endl;
|
|
|
|
|
|
} catch (const std::exception& e) {
|
|
|
|
|
|
std::cerr << "错误: " << e.what() << std::endl;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 处理视频
|
|
|
|
|
|
|
|
|
|
|
|
```cpp
|
|
|
|
|
|
// 配置视频处理参数
|
|
|
|
|
|
std::map<std::string, std::string> videoConfig;
|
|
|
|
|
|
videoConfig["video"] = "/path/to/video.mp4";
|
|
|
|
|
|
videoConfig["fps"] = "2.0"; // 采样帧率
|
|
|
|
|
|
videoConfig["min_frames"] = "4"; // 最少帧数
|
|
|
|
|
|
videoConfig["max_frames"] = "32"; // 最多帧数
|
|
|
|
|
|
videoConfig["video_start"] = "10.0"; // 开始时间(秒)
|
|
|
|
|
|
videoConfig["video_end"] = "60.0"; // 结束时间(秒)
|
|
|
|
|
|
|
|
|
|
|
|
std::vector<cv::Mat> frames = processor.fetchVideo(videoConfig);
|
|
|
|
|
|
std::cout << "提取了 " << frames.size() << " 帧" << std::endl;
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Qwen2VL处理器用法
|
|
|
|
|
|
|
|
|
|
|
|
```cpp
|
|
|
|
|
|
#include "qwen2_vl_processor.hpp"
|
|
|
|
|
|
|
|
|
|
|
|
int main() {
|
|
|
|
|
|
// 创建模拟的处理器组件
|
|
|
|
|
|
auto image_processor = std::make_shared<MockImageProcessor>();
|
|
|
|
|
|
auto video_processor = std::make_shared<MockVideoProcessor>();
|
|
|
|
|
|
auto tokenizer = std::make_shared<MockTokenizer>();
|
|
|
|
|
|
|
|
|
|
|
|
// 创建Qwen2VL处理器
|
|
|
|
|
|
Qwen2VLProcessor processor(image_processor, tokenizer, video_processor);
|
|
|
|
|
|
|
|
|
|
|
|
// 处理包含图像的文本
|
|
|
|
|
|
TextInput text = "描述这张图片: <|image_pad|> 你看到了什么?";
|
|
|
|
|
|
cv::Mat image = cv::imread("/path/to/image.jpg");
|
|
|
|
|
|
ImageInput image_input = image;
|
|
|
|
|
|
|
|
|
|
|
|
// 配置处理参数
|
|
|
|
|
|
ProcessorKwargs kwargs;
|
|
|
|
|
|
kwargs.text_kwargs["return_tensors"] = "pt";
|
|
|
|
|
|
kwargs.text_kwargs["return_mm_token_type_ids"] = "true";
|
|
|
|
|
|
|
|
|
|
|
|
try {
|
|
|
|
|
|
// 进行多模态处理
|
|
|
|
|
|
BatchFeature result = processor(image_input, text, std::nullopt, kwargs);
|
|
|
|
|
|
|
|
|
|
|
|
std::cout << "处理成功!" << std::endl;
|
|
|
|
|
|
std::cout << "结果包含的数据键: ";
|
|
|
|
|
|
for (const auto& [key, value] : result.data) {
|
|
|
|
|
|
std::cout << key << " ";
|
|
|
|
|
|
}
|
|
|
|
|
|
std::cout << std::endl;
|
|
|
|
|
|
|
|
|
|
|
|
} catch (const std::exception& e) {
|
|
|
|
|
|
std::cerr << "错误: " << e.what() << std::endl;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 计算多模态token数量
|
|
|
|
|
|
|
|
|
|
|
|
```cpp
|
|
|
|
|
|
// 计算图像和视频所需的token数量
|
|
|
|
|
|
std::vector<std::pair<int, int>> image_sizes = {{224, 224}, {448, 448}};
|
|
|
|
|
|
std::vector<std::tuple<int, int, int>> video_sizes = {{8, 224, 224}};
|
|
|
|
|
|
|
|
|
|
|
|
MultiModalData mm_data = processor.getNumMultimodalTokens(image_sizes, video_sizes);
|
|
|
|
|
|
|
|
|
|
|
|
if (mm_data.num_image_tokens.has_value()) {
|
|
|
|
|
|
std::cout << "图像token数量: ";
|
|
|
|
|
|
for (int count : mm_data.num_image_tokens.value()) {
|
|
|
|
|
|
std::cout << count << " ";
|
|
|
|
|
|
}
|
|
|
|
|
|
std::cout << std::endl;
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 批量处理
|
|
|
|
|
|
|
|
|
|
|
|
```cpp
|
|
|
|
|
|
// 处理包含多媒体的会话数据
|
|
|
|
|
|
std::vector<std::map<std::string, std::string>> conversations;
|
|
|
|
|
|
|
|
|
|
|
|
// 添加图像
|
|
|
|
|
|
std::map<std::string, std::string> imageItem;
|
|
|
|
|
|
imageItem["type"] = "image";
|
|
|
|
|
|
imageItem["image"] = "/path/to/image.jpg";
|
|
|
|
|
|
conversations.push_back(imageItem);
|
|
|
|
|
|
|
|
|
|
|
|
// 添加视频
|
|
|
|
|
|
std::map<std::string, std::string> videoItem;
|
|
|
|
|
|
videoItem["type"] = "video";
|
|
|
|
|
|
videoItem["video"] = "/path/to/video.mp4";
|
|
|
|
|
|
videoItem["fps"] = "2.0";
|
|
|
|
|
|
conversations.push_back(videoItem);
|
|
|
|
|
|
|
|
|
|
|
|
// 处理所有媒体
|
|
|
|
|
|
ProcessResult result = processor.processVisionInfo(conversations, true);
|
|
|
|
|
|
|
|
|
|
|
|
std::cout << "处理结果:" << std::endl;
|
|
|
|
|
|
std::cout << "- 图像数量: " << result.imageInputs.size() << std::endl;
|
|
|
|
|
|
std::cout << "- 视频数量: " << result.videoInputs.size() << std::endl;
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### AutoProcessor用法(工厂模式)
|
|
|
|
|
|
|
|
|
|
|
|
```cpp
|
|
|
|
|
|
#include "auto_processor.hpp"
|
|
|
|
|
|
|
|
|
|
|
|
int main() {
|
|
|
|
|
|
try {
|
|
|
|
|
|
// 方法1: 从预训练模型目录加载
|
|
|
|
|
|
FromPretrainedKwargs kwargs;
|
|
|
|
|
|
kwargs.trust_remote_code = true;
|
|
|
|
|
|
|
|
|
|
|
|
auto processor = AutoProcessor::fromPretrained("/path/to/model", kwargs);
|
|
|
|
|
|
|
|
|
|
|
|
// 方法2: 注册自定义处理器
|
|
|
|
|
|
AutoProcessor::registerProcessor("custom_model", [](const std::string& path, const FromPretrainedKwargs& kwargs) {
|
|
|
|
|
|
// 创建自定义处理器的逻辑
|
|
|
|
|
|
return createCustomProcessor(path, kwargs);
|
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
|
|
// 使用处理器
|
|
|
|
|
|
TextInput text = "描述这个图像: <|image_pad|>";
|
|
|
|
|
|
cv::Mat image = cv::imread("image.jpg");
|
|
|
|
|
|
|
|
|
|
|
|
ProcessorKwargs proc_kwargs;
|
|
|
|
|
|
auto result = processor(ImageInput(image), text, std::nullopt, proc_kwargs);
|
|
|
|
|
|
|
|
|
|
|
|
std::cout << "处理完成!" << std::endl;
|
|
|
|
|
|
|
|
|
|
|
|
} catch (const std::exception& e) {
|
|
|
|
|
|
std::cerr << "错误: " << e.what() << std::endl;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 📖 API 参考
|
|
|
|
|
|
|
|
|
|
|
|
### VisionProcessor 类
|
|
|
|
|
|
|
|
|
|
|
|
#### 主要方法
|
|
|
|
|
|
|
|
|
|
|
|
| 方法 | 说明 | 返回值 |
|
|
|
|
|
|
|------|------|--------|
|
|
|
|
|
|
| `fetchImage()` | 处理单张图像 | `cv::Mat` |
|
|
|
|
|
|
| `fetchVideo()` | 处理视频文件 | `std::vector<cv::Mat>` |
|
|
|
|
|
|
| `processVisionInfo()` | 批量处理会话数据 | `ProcessResult` |
|
|
|
|
|
|
|
|
|
|
|
|
#### 工具函数
|
|
|
|
|
|
|
|
|
|
|
|
| 函数 | 说明 | 参数 |
|
|
|
|
|
|
|------|------|------|
|
|
|
|
|
|
| `smartResize()` | 智能调整尺寸 | `height, width, factor, minPixels, maxPixels` |
|
|
|
|
|
|
| `roundByFactor()` | 按因子四舍五入 | `number, factor` |
|
|
|
|
|
|
| `ceilByFactor()` | 按因子向上取整 | `number, factor` |
|
|
|
|
|
|
| `floorByFactor()` | 按因子向下取整 | `number, factor` |
|
|
|
|
|
|
|
|
|
|
|
|
### 配置参数
|
|
|
|
|
|
|
|
|
|
|
|
#### 图像配置
|
|
|
|
|
|
```cpp
|
|
|
|
|
|
std::map<std::string, std::string> config;
|
|
|
|
|
|
config["image"] = "path/to/image.jpg"; // 图像路径
|
|
|
|
|
|
config["image_url"] = "http://example.com/image.jpg"; // 或使用URL
|
|
|
|
|
|
config["min_pixels"] = "3136"; // 最小像素数 (默认: 4*28*28)
|
|
|
|
|
|
config["max_pixels"] = "12845056"; // 最大像素数 (默认: 16384*28*28)
|
|
|
|
|
|
config["resized_height"] = "224"; // 指定高度
|
|
|
|
|
|
config["resized_width"] = "224"; // 指定宽度
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
#### 视频配置
|
|
|
|
|
|
```cpp
|
|
|
|
|
|
std::map<std::string, std::string> config;
|
|
|
|
|
|
config["video"] = "path/to/video.mp4"; // 视频路径
|
|
|
|
|
|
config["fps"] = "2.0"; // 采样帧率 (默认: 2.0)
|
|
|
|
|
|
config["nframes"] = "16"; // 或直接指定帧数
|
|
|
|
|
|
config["min_frames"] = "4"; // 最小帧数 (默认: 4)
|
|
|
|
|
|
config["max_frames"] = "768"; // 最大帧数 (默认: 768)
|
|
|
|
|
|
config["video_start"] = "10.0"; // 开始时间(秒)
|
|
|
|
|
|
config["video_end"] = "60.0"; // 结束时间(秒)
|
|
|
|
|
|
config["min_pixels"] = "100352"; // 最小像素数 (默认: 128*28*28)
|
|
|
|
|
|
config["max_pixels"] = "602112"; // 最大像素数 (默认: 768*28*28)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 🎨 支持的输入格式
|
|
|
|
|
|
|
|
|
|
|
|
### 图像格式
|
|
|
|
|
|
- **本地文件**: `"/path/to/image.jpg"`
|
|
|
|
|
|
- **HTTP URL**: `"http://example.com/image.jpg"`
|
|
|
|
|
|
- **HTTPS URL**: `"https://example.com/image.jpg"`
|
|
|
|
|
|
- **Base64编码**: `"..."`
|
|
|
|
|
|
- **File URI**: `"file:///path/to/image.jpg"`
|
|
|
|
|
|
|
|
|
|
|
|
### 视频格式
|
|
|
|
|
|
- **本地文件**: `"/path/to/video.mp4"`
|
|
|
|
|
|
- **File URI**: `"file:///path/to/video.mp4"`
|
|
|
|
|
|
|
|
|
|
|
|
### 支持的文件扩展名
|
|
|
|
|
|
- **图像**: `.jpg`, `.jpeg`, `.png`, `.bmp`, `.tiff`, `.tif`, `.webp`
|
|
|
|
|
|
- **视频**: `.mp4`, `.avi`, `.mov`, `.mkv`, `.wmv`, `.flv`, `.webm`
|
|
|
|
|
|
|
|
|
|
|
|
## ⚙️ 高级配置
|
|
|
|
|
|
|
|
|
|
|
|
### 环境变量
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 设置视频处理的最大像素限制
|
|
|
|
|
|
export VIDEO_MAX_PIXELS=100000000
|
|
|
|
|
|
|
|
|
|
|
|
# 运行程序
|
|
|
|
|
|
./vision_process_example
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 常量配置
|
|
|
|
|
|
```cpp
|
|
|
|
|
|
// 可在头文件中修改的常量
|
|
|
|
|
|
static constexpr int IMAGE_FACTOR = 28; // 图像尺寸因子
|
|
|
|
|
|
static constexpr int MIN_PIXELS = 4 * 28 * 28; // 图像最小像素
|
|
|
|
|
|
static constexpr int MAX_PIXELS = 16384 * 28 * 28; // 图像最大像素
|
|
|
|
|
|
static constexpr int MAX_RATIO = 200; // 最大宽高比
|
|
|
|
|
|
|
|
|
|
|
|
static constexpr int VIDEO_MIN_PIXELS = 128 * 28 * 28; // 视频最小像素
|
|
|
|
|
|
static constexpr int VIDEO_MAX_PIXELS = 768 * 28 * 28; // 视频最大像素
|
|
|
|
|
|
static constexpr int FRAME_FACTOR = 2; // 帧数因子
|
|
|
|
|
|
static constexpr double FPS = 2.0; // 默认帧率
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 🧪 测试
|
|
|
|
|
|
|
|
|
|
|
|
运行示例程序:
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 编译后运行测试
|
|
|
|
|
|
./build/test/vision_process_example
|
|
|
|
|
|
|
|
|
|
|
|
# 或者使用CMake测试
|
|
|
|
|
|
cd build
|
|
|
|
|
|
ctest
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 🐛 错误处理
|
|
|
|
|
|
|
|
|
|
|
|
库使用标准C++异常处理机制:
|
|
|
|
|
|
|
|
|
|
|
|
```cpp
|
|
|
|
|
|
try {
|
|
|
|
|
|
auto result = processor.fetchImage(config);
|
|
|
|
|
|
// 处理结果
|
|
|
|
|
|
} catch (const std::invalid_argument& e) {
|
|
|
|
|
|
std::cerr << "参数错误: " << e.what() << std::endl;
|
|
|
|
|
|
} catch (const std::runtime_error& e) {
|
|
|
|
|
|
std::cerr << "运行时错误: " << e.what() << std::endl;
|
|
|
|
|
|
} catch (const std::exception& e) {
|
|
|
|
|
|
std::cerr << "未知错误: " << e.what() << std::endl;
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 常见错误类型
|
|
|
|
|
|
- `std::invalid_argument`: 参数错误或配置无效
|
|
|
|
|
|
- `std::runtime_error`: 文件读取、网络请求或处理失败
|
|
|
|
|
|
- `cv::Exception`: OpenCV相关错误
|
|
|
|
|
|
|
|
|
|
|
|
## 🚀 性能优化
|
|
|
|
|
|
|
|
|
|
|
|
### 内存管理
|
|
|
|
|
|
- 使用RAII自动管理内存
|
|
|
|
|
|
- 预分配向量容量减少重分配
|
|
|
|
|
|
- 及时释放大型Mat对象
|
|
|
|
|
|
|
|
|
|
|
|
### 处理优化
|
|
|
|
|
|
- 批量处理多个文件时重用processor实例
|
|
|
|
|
|
- 对于大视频文件,考虑分段处理
|
|
|
|
|
|
- 使用多线程处理多个独立任务
|
|
|
|
|
|
|
|
|
|
|
|
### 示例:批量处理优化
|
|
|
|
|
|
```cpp
|
|
|
|
|
|
VisionProcessor processor; // 重用实例
|
|
|
|
|
|
|
|
|
|
|
|
for (const auto& config : imageConfigs) {
|
|
|
|
|
|
try {
|
|
|
|
|
|
auto image = processor.fetchImage(config);
|
|
|
|
|
|
// 处理图像...
|
|
|
|
|
|
} catch (const std::exception& e) {
|
|
|
|
|
|
// 记录错误但继续处理其他图像
|
|
|
|
|
|
std::cerr << "跳过图像: " << e.what() << std::endl;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 🤝 贡献
|
|
|
|
|
|
|
|
|
|
|
|
欢迎提交Issue和Pull Request!
|
|
|
|
|
|
|
|
|
|
|
|
### 开发设置
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 克隆仓库
|
|
|
|
|
|
git clone <repository-url>
|
|
|
|
|
|
cd project-root
|
|
|
|
|
|
|
|
|
|
|
|
# 创建开发分支
|
|
|
|
|
|
git checkout -b feature/your-feature
|
|
|
|
|
|
|
|
|
|
|
|
# 编译并测试
|
|
|
|
|
|
mkdir build && cd build
|
|
|
|
|
|
cmake -DCMAKE_BUILD_TYPE=Debug ..
|
|
|
|
|
|
make
|
|
|
|
|
|
./test/vision_process_example
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 📄 许可证
|
|
|
|
|
|
|
|
|
|
|
|
本项目基于原始Qwen多模态Python代码改编,请遵循相应的开源许可证条款。
|
|
|
|
|
|
|
|
|
|
|
|
## 🆘 常见问题
|
|
|
|
|
|
|
|
|
|
|
|
### Q: 编译时找不到OpenCV
|
|
|
|
|
|
A: 确保安装了OpenCV开发包,或使用`-DOpenCV_DIR`指定路径。
|
|
|
|
|
|
|
|
|
|
|
|
### Q: libcurl链接错误
|
|
|
|
|
|
A: 安装libcurl开发包:`sudo apt-get install libcurl4-openssl-dev`
|
|
|
|
|
|
|
|
|
|
|
|
### Q: 视频文件无法打开
|
|
|
|
|
|
A: 确保OpenCV编译时启用了相应的视频编解码器支持。
|
|
|
|
|
|
|
|
|
|
|
|
### Q: 内存使用过高
|
|
|
|
|
|
A: 对于大文件,考虑降低max_pixels设置或分批处理。
|
|
|
|
|
|
|
|
|
|
|
|
### Q: Base64解码失败
|
|
|
|
|
|
A: 确保Base64字符串格式正确,包含正确的MIME类型前缀。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**🌟 如果这个项目对您有帮助,请给我们一个Star!**
|
|
|
|
|
|
|
|
|
|
|
|
### Qwen2VLProcessor 类
|
|
|
|
|
|
|
|
|
|
|
|
#### 主要方法
|
|
|
|
|
|
|
|
|
|
|
|
| 方法 | 说明 | 返回值 |
|
|
|
|
|
|
|------|------|--------|
|
|
|
|
|
|
| `operator()` | 主处理方法,处理多模态输入 | `BatchFeature` |
|
|
|
|
|
|
| `getNumMultimodalTokens()` | 计算多模态token数量 | `MultiModalData` |
|
|
|
|
|
|
| `batchDecode()` | 批量解码token序列 | `std::vector<std::string>` |
|
|
|
|
|
|
| `decode()` | 解码单个token序列 | `std::string` |
|
|
|
|
|
|
| `postProcessImageTextToText()` | 后处理生成的文本 | `std::vector<std::string>` |
|
|
|
|
|
|
| `getModelInputNames()` | 获取模型输入名称 | `std::vector<std::string>` |
|
|
|
|
|
|
|
|
|
|
|
|
#### 支持的数据结构
|
|
|
|
|
|
|
|
|
|
|
|
```cpp
|
|
|
|
|
|
// 批处理特征结构
|
|
|
|
|
|
struct BatchFeature {
|
|
|
|
|
|
std::map<std::string, std::vector<std::vector<int>>> data;
|
|
|
|
|
|
std::string tensor_type;
|
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
// 多模态数据信息
|
|
|
|
|
|
struct MultiModalData {
|
|
|
|
|
|
std::optional<std::vector<int>> num_image_tokens;
|
|
|
|
|
|
std::optional<std::vector<int>> num_image_patches;
|
|
|
|
|
|
std::optional<std::vector<int>> num_video_tokens;
|
|
|
|
|
|
std::optional<std::vector<int>> num_video_patches;
|
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
// 处理器配置参数
|
|
|
|
|
|
struct ProcessorKwargs {
|
|
|
|
|
|
std::map<std::string, std::string> images_kwargs;
|
|
|
|
|
|
std::map<std::string, std::string> videos_kwargs;
|
|
|
|
|
|
std::map<std::string, std::string> text_kwargs;
|
|
|
|
|
|
};
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
#### 抽象接口
|
|
|
|
|
|
|
|
|
|
|
|
如需自定义处理器,需要实现以下抽象接口:
|
|
|
|
|
|
|
|
|
|
|
|
```cpp
|
|
|
|
|
|
class ImageProcessor {
|
|
|
|
|
|
public:
|
|
|
|
|
|
virtual std::map<std::string, std::vector<std::vector<int>>> processImages(
|
|
|
|
|
|
const ImageInput& images,
|
|
|
|
|
|
const std::map<std::string, std::string>& kwargs) = 0;
|
|
|
|
|
|
virtual int getMergeSize() const = 0;
|
|
|
|
|
|
virtual int getNumberOfImagePatches(int height, int width,
|
|
|
|
|
|
const std::map<std::string, std::string>& kwargs) const = 0;
|
|
|
|
|
|
virtual std::vector<std::string> getModelInputNames() const = 0;
|
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
class VideoProcessor {
|
|
|
|
|
|
public:
|
|
|
|
|
|
virtual std::map<std::string, std::vector<std::vector<int>>> processVideos(
|
|
|
|
|
|
const VideoInput& videos,
|
|
|
|
|
|
const std::map<std::string, std::string>& kwargs) = 0;
|
|
|
|
|
|
virtual int getMergeSize() const = 0;
|
|
|
|
|
|
virtual int getNumberOfVideoPatches(int num_frames, int height, int width,
|
|
|
|
|
|
const std::map<std::string, std::string>& kwargs) const = 0;
|
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
class Tokenizer {
|
|
|
|
|
|
public:
|
|
|
|
|
|
virtual std::map<std::string, std::vector<std::vector<int>>> tokenize(
|
|
|
|
|
|
const std::vector<std::string>& texts,
|
|
|
|
|
|
const std::map<std::string, std::string>& kwargs) = 0;
|
|
|
|
|
|
virtual std::vector<std::string> batchDecode(
|
|
|
|
|
|
const std::vector<std::vector<int>>& token_ids,
|
|
|
|
|
|
bool skip_special_tokens = true,
|
|
|
|
|
|
bool clean_up_tokenization_spaces = false) = 0;
|
|
|
|
|
|
virtual std::string decode(
|
|
|
|
|
|
const std::vector<int>& token_ids,
|
|
|
|
|
|
bool skip_special_tokens = true,
|
|
|
|
|
|
bool clean_up_tokenization_spaces = false) = 0;
|
|
|
|
|
|
virtual int convertTokensToIds(const std::string& token) = 0;
|
|
|
|
|
|
virtual std::vector<std::string> getModelInputNames() const = 0;
|
|
|
|
|
|
};
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
#### 工具函数
|