Skip to content

Commit cbc3aa2

Browse files
committed
add hand landmarks support
1 parent 094d444 commit cbc3aa2

File tree

13 files changed

+363
-2
lines changed

13 files changed

+363
-2
lines changed

components/maix/include/convert_image.hpp

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ namespace maix::image
1111
/**
1212
* OpenCV Mat(numpy array object) to Image object
1313
* @param array numpy array object, must be a 3-dim or 2-dim continuous array with shape hwc or hw
14-
* @param bgr if set bgr, the return image will be marked as BGR888 or BGRA8888 format, grayscale will ignore this arg.
14+
* @param bgr if set bgr, the return image will be marked as BGR888 or BGRA8888 format(only mark, not ensure return image is real BGR format), grayscale will ignore this arg.
1515
* @param copy if true, will alloc new buffer and copy data, else will directly use array's data buffer, default true.
1616
* Use this arg carefully, when set to false, ther array MUST keep alive until we don't use the return img of this func, or will cause program crash.
1717
* @return Image object
@@ -73,6 +73,7 @@ namespace maix::image
7373
* Image object to OpenCV Mat(numpy array object)
7474
* @param img Image object, maix.image.Image type.
7575
* @param ensure_bgr auto convert to BGR888 or BGRA8888 if img format is not BGR or BGRA, if set to false, will not auto convert and directly use img's data, default true.
76+
* If copy is false, ensure_bgr always be false.
7677
* @param copy Whether alloc new image and copy data or not, if ensure_bgr and img is not bgr or bgra format, always copy,
7778
* if not copy, array object will directly use img's data buffer, will faster but change array will affect img's data, default true.
7879
* @attention take care of ensure_bgr and copy param.
@@ -81,8 +82,10 @@ namespace maix::image
8182
*/
8283
py::array_t<uint8_t, py::array::c_style> image2cv(image::Image *img, bool ensure_bgr = true, bool copy = true)
8384
{
85+
if(!copy)
86+
ensure_bgr = false;
8487
// no need to convert to bgr
85-
if(!ensure_bgr || img->format() == image::FMT_GRAYSCALE || img->format() == image::FMT_BGR888 || img->format() == image::FMT_BGRA8888)
88+
if(!ensure_bgr || (!ensure_bgr && img->format() == image::FMT_GRAYSCALE) || img->format() == image::FMT_BGR888 || img->format() == image::FMT_BGRA8888)
8689
{
8790
if(!copy)
8891
return py::array_t<uint8_t, py::array::c_style>({img->height(), img->width(), (int)image::fmt_size[img->format()]}, (const unsigned char*)img->data(), py::cast(img));
212 KB
Loading

docs/doc/assets/hands_landmarks.jpg

107 KB
Loading

docs/doc/en/sidebar.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,8 @@ items:
7777
label: OCR
7878
- file: vision/detect_obb.md
7979
label: Detect with angle(OBB)
80+
- file: vision/hand_landmarks.md
81+
label: Hand landmarks
8082
- file: vision/maixhub_train.md
8183
label: MaixHub online AI training
8284
- file: vision/customize_model_yolov5.md

docs/doc/en/vision/hand_landmarks.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
---
2+
tite: 3D Coordinate Detection of 21 Hand Keypoints with MaixPy MaixCAM
3+
update:
4+
- date: 2024-12-31
5+
version: v1.0
6+
author: neucrack
7+
content:
8+
Added source code, models, examples, and documentation
9+
---
10+
11+
## Introduction
12+
13+
In certain applications requiring hand position or gesture detection, this algorithm can be utilized. It provides:
14+
* Hand position with coordinates for four vertices.
15+
* 3D coordinates of 21 hand keypoints, including depth estimation relative to the palm.
16+
17+
Example applications:
18+
* Touch reading devices
19+
* Gesture control
20+
* Finger-based games
21+
* Sign language translation
22+
* Magic casting simulation
23+
24+
Sample image:
25+
26+
<img src="../../assets/hands_landmarks.jpg" style="max-height:24rem">
27+
28+
Sample video:
29+
<video playsinline controls autoplay loop muted preload src="/static/video/hands_landmarks.mp4" type="video/mp4">
30+
Classifier Result video
31+
</video>
32+
33+
The 21 keypoints include:
34+
![](../../assets/hand_landmarks_doc.jpg)
35+
36+
## Using Hand Keypoint Detection in MaixPy MaixCAM
37+
38+
The **MaixPy** platform integrates this algorithm (ported from MediaPipe for ease of use, firmware version **>= 4.9.3** is required). The example can also be found in the [MaixPy/examples](https://github.com/sipeed/maixpy) directory:
39+
40+
```python
41+
from maix import camera, display, image, nn, app
42+
43+
detector = nn.HandLandmarks(model="/root/models/hand_landmarks.mud")
44+
# detector = nn.HandLandmarks(model="/root/models/hand_landmarks_bf16.mud")
45+
landmarks_rel = False
46+
47+
cam = camera.Camera(320, 224, detector.input_format())
48+
disp = display.Display()
49+
50+
while not app.need_exit():
51+
img = cam.read()
52+
objs = detector.detect(img, conf_th = 0.7, iou_th = 0.45, conf_th2 = 0.8, landmarks_rel = landmarks_rel)
53+
for obj in objs:
54+
# img.draw_rect(obj.x, obj.y, obj.w, obj.h, color = image.COLOR_RED)
55+
msg = f'{detector.labels[obj.class_id]}: {obj.score:.2f}'
56+
img.draw_string(obj.points[0], obj.points[1], msg, color = image.COLOR_RED if obj.class_id == 0 else image.COLOR_GREEN, scale = 1.4, thickness = 2)
57+
detector.draw_hand(img, obj.class_id, obj.points, 4, 10, box=True)
58+
if landmarks_rel:
59+
img.draw_rect(0, 0, detector.input_width(detect=False), detector.input_height(detect=False), color = image.COLOR_YELLOW)
60+
for i in range(21):
61+
x = obj.points[8 + 21*3 + i * 2]
62+
y = obj.points[8 + 21** + i * 2 + 1]
63+
img.draw_circle(x, y, 3, color = image.COLOR_YELLOW)
64+
disp.show(img)
65+
```
66+
67+
Detection results are visualized using the `draw_hand` function. Keypoint data can be accessed via `obj.points`, providing `4 + 21` points:
68+
* The first 4 points are the bounding box corners in clockwise order: `topleft_x, topleft_y, topright_x, topright_y, bottomright_x, bottomright_y, bottomleft_x, bottomleft_y`. Values may be negative.
69+
* The remaining 21 points are keypoints in the format: `x0, y0, z0, x1, y1, z1, ..., x20, y20, z20`, where `z` represents depth relative to the palm and may also be negative.
70+
71+
Additionally, `obj.x, y, w, h, angle` attributes provide the bounding box and rotation details.
72+
73+
**Precision Optimization**: The `nn.HandLandmarks` class uses an `int8` quantized model by default for faster detection. For higher precision, switch to the `hand_landmarks_bf16.mud` model.
74+
**Relative Landmark Coordinates**: By setting the `landmarks_rel` parameter to `True`, the function will output the 21 keypoints as relative coordinates to the top-left corner of the hand's bounding box. In this case, the last `21x2` values in `obj.points` are arranged as `x0, y0, x1, y1, ..., x20, y20`.
75+
76+
## Advanced: Gesture Recognition Based on Keypoint Detection
77+
78+
### Example: Rock-Paper-Scissors Detection
79+
Two approaches:
80+
1. **Traditional Method**: Use code to classify gestures based on keypoint analysis.
81+
2. **AI Model-Based Method**: Train a classification model.
82+
83+
**Approach 2**:
84+
This involves using the 21 keypoints as input for a classification model. Without image background interference, fewer data samples are needed for effective training.
85+
86+
Steps:
87+
1. Define gesture categories (e.g., rock, paper, scissors).
88+
2. Record keypoint data upon user input.
89+
3. Normalize keypoint coordinates to relative values (0 to object width `obj.w`) using `landmarks_rel` parameter as described above.
90+
4. Collect data for each category.
91+
5. Train a classification model (e.g., using MobileNetV2 in PyTorch).
92+
6. Convert the trained model to MaixCAM-supported format.
93+
94+
This approach requires knowledge of training and quantizing classification models.
95+
96+
## Simplified Model Training Alternative
97+
For users unfamiliar with PyTorch:
98+
1. Generate an image from the 21 keypoints (customize visualization).
99+
2. Upload the images to [MaixHub.com](https://maixhub.com) for model training.
100+
3. Use the trained model in MaixPy for classification.
101+
102+
## Complex Action Recognition
103+
For actions requiring time-series analysis (e.g., circular motions):
104+
* Store keypoint history in a queue for temporal analysis.
105+
* Input historical sequences into a classification model for time-series gesture recognition.
106+
* Alternatively, generate a single image from historical data and classify it.
107+
108+
These methods allow advanced gesture and action recognition leveraging MaixPy's integrated tools.
109+
110+
---
111+
112+
This version includes all details, including the explanation for `landmarks_rel`.
113+

docs/doc/zh/sidebar.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,8 @@ items:
7777
label: OCR 文字识别
7878
- file: vision/detect_obb.md
7979
label: 带旋转角度的检测(OBB)
80+
- file: vision/hand_landmarks.md
81+
label: 手部关键点检测
8082
- file: vision/maixhub_train.md
8183
label: MaixHub 在线训练 AI 模型
8284
- file: vision/customize_model_yolov5.md

docs/doc/zh/vision/hand_landmarks.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
---
2+
tite: MaixPy MaixCAM 人手部 21 个关键点三维坐标检测
3+
update:
4+
- date: 2024-12-31
5+
version: v1.0
6+
author: neucrack
7+
content:
8+
添加源码、模型、例程和文档
9+
---
10+
11+
12+
## 简介
13+
14+
在一些应用中我们需要检测手的位置,或者手的姿态时,可以使用本算法,此算法可以检测到:
15+
* 手的位置,提供四个顶点坐标。
16+
* 手的 21 个关键点坐标,以及每个点相对手掌的深度估计。
17+
18+
应用举例:
19+
* 点读机
20+
* 手势控制
21+
* 手指类游戏
22+
* 手语转译
23+
* 施魔法
24+
25+
效果图如下:
26+
27+
<img src="../../assets/hands_landmarks.jpg" style="max-height:24rem">
28+
29+
效果视频:
30+
<video playsinline controls autoplay loop muted preload src="/static/video/hands_landmarks.mp4" type="video/mp4">
31+
Classifier Result video
32+
</video>
33+
34+
21 个关键点包括:
35+
![](../../assets/hand_landmarks_doc.jpg)
36+
37+
38+
## MaixPy MaixCAM 中使用手关键点检测
39+
40+
**MaixPy** 中已经内置了该算法(移植于 mediapipe,有兴趣可自行学习),可以方便地使用(**固件版本必须 >= 4.9.3**),此例程也可以在[MaixPy/examples](https://github.com/sipeed/maixpy)目录中找到:
41+
```python
42+
from maix import camera, display, image, nn, app
43+
44+
detector = nn.HandLandmarks(model="/root/models/hand_landmarks.mud")
45+
# detector = nn.HandLandmarks(model="/root/models/hand_landmarks_bf16.mud")
46+
landmarks_rel = False
47+
48+
cam = camera.Camera(320, 224, detector.input_format())
49+
disp = display.Display()
50+
51+
while not app.need_exit():
52+
img = cam.read()
53+
objs = detector.detect(img, conf_th = 0.7, iou_th = 0.45, conf_th2 = 0.8, landmarks_rel = landmarks_rel)
54+
for obj in objs:
55+
# img.draw_rect(obj.x, obj.y, obj.w, obj.h, color = image.COLOR_RED)
56+
msg = f'{detector.labels[obj.class_id]}: {obj.score:.2f}'
57+
img.draw_string(obj.points[0], obj.points[1], msg, color = image.COLOR_RED if obj.class_id == 0 else image.COLOR_GREEN, scale = 1.4, thickness = 2)
58+
detector.draw_hand(img, obj.class_id, obj.points, 4, 10, box=True)
59+
if landmarks_rel:
60+
img.draw_rect(0, 0, detector.input_width(detect=False), detector.input_height(detect=False), color = image.COLOR_YELLOW)
61+
for i in range(21):
62+
x = obj.points[8 + 21*3 + i * 2]
63+
y = obj.points[8 + 21** + i * 2 + 1]
64+
img.draw_circle(x, y, 3, color = image.COLOR_YELLOW)
65+
disp.show(img)
66+
```
67+
68+
检测的结果用了`draw_hand`这个函数来画,你可以从`obj.points`得到所有关键点信息,一共`4 + 21`个点,格式为:
69+
* 前 4 个点是手外框的四个角坐标,从左上角开始,逆时针4个点,`topleft_x, topleft_y, topright_x, topright_y, bottomright_x, bottomright_y, bottomleft_x, bottomleft_y`,注意值可能会小于0.
70+
* 后 21 个点是手部的关键点,如简介中所说的顺序,格式:`x0, y0, z0, x1, y1, z1, ..., x20, y20, z20`,其中 `z`为相对于手掌的深度信息。注意值可能会小于 0。
71+
72+
另外`obj``x, y, w, h, angle` 属性也可以直接使用,分别代表了旋转前的框坐标和大小,以及旋转角度(0到360度)。
73+
74+
**精度优化**:这里使用了`nn.HandLandmarks`这个类来进行检测,默认用了`int8`量化的模型,速度会更快,如果需要更高的精度可以更换为`hand_landmarks_bf16.mud`这个模型。
75+
**得到相对于手左上角顶点的关键点坐标**:你可以选择得到相对于手左上角顶点的关键点坐标值,值范围为 `0` 到 手框宽度(`obj.w`),方法:
76+
```python
77+
objs = detector.detect(img, conf_th = 0.7, iou_th = 0.45, conf_th2 = 0.8, landmarks_rel = True)
78+
```
79+
这里`landmarks_rel` 参数就是告诉这个函数输出`21`个点相对于手左上角顶点的相对坐标,在`obj.points`最后`21x2`个值`x0y0x1y1..x20y20`排列。
80+
81+
## 进阶:基于关键点检测实现手姿态识别
82+
83+
举个例子,比如我们要实现检测石头剪刀布检测,有两种方法:
84+
* 方法1: 直接根据关键点进行判断,使用代码判断手的形状,比如手指张开,手掌朝上,手掌朝下等等。
85+
* 方法2: 使用 AI 模型进行分类。
86+
87+
方法1是传统的方法,简单快速,在简单的手势判断比较稳定,这里主要说`方法2`:
88+
方法2 即用训练一个分类模型来对手势分类,不过可能需要大量图片和背景来训练,我们在此有两个优化方案:
89+
1. 使用检测手的模型比如 YOLO11 先检测手,再裁切出只有手的部分,再训练分类模型,这样分类模型的输入只有手部分,减少了干扰。
90+
2. 使用手关键点检测,得到关键点数据,用这 21 个关键点数据作为分类模型的输入进行分类,这样直接没有了背景信息,更加准确!
91+
92+
所以这里主要采用的就是思路 `2`,将关键点信息作为分类模型的输入,因为没有图片背景信息干扰,只需要采集比较少量的数据就能达到比较好的效果。
93+
步骤:
94+
* 确定分类的手势,比如石头剪刀布三个分类。
95+
* 修改上面的代码,比如点击屏幕一下就记录下当前手的关键点信息到文件系统存起来。
96+
* 修改上面的代码,为了让分类模型输入更统一,你可以选择得到相对于手左上顶点的关键点坐标值,值范围为 `0` 到 手框宽度(`obj.w`),参考上面`landmarks_rel`参数。
97+
* 分别采集这几个分类的数据。
98+
* 在电脑上创建一个分类模型,比如基于 mobilenetv2 的分类模型,使用 pytorch 进行训练,实际输入还可以将坐标都归一化到[0, 1]
99+
* 分类模型训练完成后[导出成 MaixCAM 支持的格式](../ai_model_converter/maixcam.md)(量化数据需要先打包成`npz`格式)。
100+
* 在 MaixPy 中检测手之后,得到关键点再运行分类模型进行得到结果,代码可以参考例程中的`nn_forward.py``nn_custom_classifier.py`
101+
102+
这样就可以以很少的训练数据训练不同手势了,这种方式要求你会训练分类模型,以及量化转换模型格式。
103+
104+
## 进阶:基于关键点检测实现手姿态识别之--模型训练简易版本
105+
106+
上面的方法需要你会自己使用 pytorch 修改训练模型,以及量化转模型格式比较麻烦。
107+
这里提供另外一种简单很多的曲线救国的方式,无需自己搭建环境训练和模型转换:
108+
* 同上一个方法获取手相对于手左上角顶点的坐标。
109+
* 基于这些点生成一幅图,不同的点可以用不同的颜色,具体请自行思考和尝试生成什么样的图比较好。
110+
* 将生成的图片上传到[MaixHub.com](https://maixhub.com) 创建分类模型项目,在线训练,选择 MaixCAM 平台。
111+
* 一键训练,完成后得到模型,后台会自动训练并转换成 MaixCAM 支持的格式。
112+
* 修改例程,识别到关键点后,按照同样的方法生成图片,然后传给你训练的分类模型进行识别得到结果。
113+
114+
115+
## 进阶:基于关键点检测实现手动作识别之--复杂动作
116+
117+
上面的是简单的动作,是单张图片识别,如果想时间轴维度上识别,比如识别画圆圈动作:
118+
* 一种方法是把历史关键点存在队列中,根据队列中的关键点数据用代码来判断动作。
119+
* 另外一种是将历史关键点作为一个序列输入到分类模型中,这样分类模型就可以识别时间轴上的动作了。
120+
* 还可以将历史关键点合成一张图片给分类模型也可以。

docs/static/video/hands_landmarks.mp4

2.97 MB
Binary file not shown.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
from maix import camera, display, image, nn, app
2+
3+
detector = nn.HandLandmarks(model="/root/models/hand_landmarks.mud")
4+
# detector = nn.HandLandmarks(model="/root/models/hand_landmarks_bf16.mud")
5+
landmarks_rel = False
6+
7+
cam = camera.Camera(320, 224, detector.input_format())
8+
disp = display.Display()
9+
10+
while not app.need_exit():
11+
img = cam.read()
12+
objs = detector.detect(img, conf_th = 0.7, iou_th = 0.45, conf_th2 = 0.8, landmarks_rel = landmarks_rel)
13+
for obj in objs:
14+
# img.draw_rect(obj.x, obj.y, obj.w, obj.h, color = image.COLOR_RED)
15+
msg = f'{detector.labels[obj.class_id]}: {obj.score:.2f}'
16+
img.draw_string(obj.points[0], obj.points[1], msg, color = image.COLOR_RED if obj.class_id == 0 else image.COLOR_GREEN, scale = 1.4, thickness = 2)
17+
detector.draw_hand(img, obj.class_id, obj.points, 4, 10, box=True)
18+
if landmarks_rel:
19+
img.draw_rect(0, 0, detector.input_width(detect=False), detector.input_height(detect=False), color = image.COLOR_YELLOW)
20+
for i in range(21):
21+
x = obj.points[8 + 21*3 + i * 2]
22+
y = obj.points[8 + 21** + i * 2 + 1]
23+
img.draw_circle(x, y, 3, color = image.COLOR_YELLOW)
24+
disp.show(img)
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
2+
build
3+
dist
4+
/CMakeLists.txt
5+

projects/app_hand_landmarks/app.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
id: hand_landmarks
2+
name: Hand Landmarks
3+
name[zh]: 手关键点
4+
version: 1.0.0
5+
author: Neucrack@Sipeed
6+
icon: icon.png
7+
desc: Hand Landmarks detect hand keypoints
8+
files:
9+
- icon.png
10+
- app.yaml
11+
- main.py

projects/app_hand_landmarks/icon.png

2.35 KB
Loading

0 commit comments

Comments
 (0)