|
| 1 | +--- |
| 2 | +tite: 3D Coordinate Detection of 21 Hand Keypoints with MaixPy MaixCAM |
| 3 | +update: |
| 4 | + - date: 2024-12-31 |
| 5 | + version: v1.0 |
| 6 | + author: neucrack |
| 7 | + content: |
| 8 | + Added source code, models, examples, and documentation |
| 9 | +--- |
| 10 | + |
| 11 | +## Introduction |
| 12 | + |
| 13 | +In certain applications requiring hand position or gesture detection, this algorithm can be utilized. It provides: |
| 14 | +* Hand position with coordinates for four vertices. |
| 15 | +* 3D coordinates of 21 hand keypoints, including depth estimation relative to the palm. |
| 16 | + |
| 17 | +Example applications: |
| 18 | +* Touch reading devices |
| 19 | +* Gesture control |
| 20 | +* Finger-based games |
| 21 | +* Sign language translation |
| 22 | +* Magic casting simulation |
| 23 | + |
| 24 | +Sample image: |
| 25 | + |
| 26 | +<img src="../../assets/hands_landmarks.jpg" style="max-height:24rem"> |
| 27 | + |
| 28 | +Sample video: |
| 29 | +<video playsinline controls autoplay loop muted preload src="/static/video/hands_landmarks.mp4" type="video/mp4"> |
| 30 | +Classifier Result video |
| 31 | +</video> |
| 32 | + |
| 33 | +The 21 keypoints include: |
| 34 | + |
| 35 | + |
| 36 | +## Using Hand Keypoint Detection in MaixPy MaixCAM |
| 37 | + |
| 38 | +The **MaixPy** platform integrates this algorithm (ported from MediaPipe for ease of use, firmware version **>= 4.9.3** is required). The example can also be found in the [MaixPy/examples](https://github.com/sipeed/maixpy) directory: |
| 39 | + |
| 40 | +```python |
| 41 | +from maix import camera, display, image, nn, app |
| 42 | + |
| 43 | +detector = nn.HandLandmarks(model="/root/models/hand_landmarks.mud") |
| 44 | +# detector = nn.HandLandmarks(model="/root/models/hand_landmarks_bf16.mud") |
| 45 | +landmarks_rel = False |
| 46 | + |
| 47 | +cam = camera.Camera(320, 224, detector.input_format()) |
| 48 | +disp = display.Display() |
| 49 | + |
| 50 | +while not app.need_exit(): |
| 51 | + img = cam.read() |
| 52 | + objs = detector.detect(img, conf_th = 0.7, iou_th = 0.45, conf_th2 = 0.8, landmarks_rel = landmarks_rel) |
| 53 | + for obj in objs: |
| 54 | + # img.draw_rect(obj.x, obj.y, obj.w, obj.h, color = image.COLOR_RED) |
| 55 | + msg = f'{detector.labels[obj.class_id]}: {obj.score:.2f}' |
| 56 | + img.draw_string(obj.points[0], obj.points[1], msg, color = image.COLOR_RED if obj.class_id == 0 else image.COLOR_GREEN, scale = 1.4, thickness = 2) |
| 57 | + detector.draw_hand(img, obj.class_id, obj.points, 4, 10, box=True) |
| 58 | + if landmarks_rel: |
| 59 | + img.draw_rect(0, 0, detector.input_width(detect=False), detector.input_height(detect=False), color = image.COLOR_YELLOW) |
| 60 | + for i in range(21): |
| 61 | + x = obj.points[8 + 21*3 + i * 2] |
| 62 | + y = obj.points[8 + 21** + i * 2 + 1] |
| 63 | + img.draw_circle(x, y, 3, color = image.COLOR_YELLOW) |
| 64 | + disp.show(img) |
| 65 | +``` |
| 66 | + |
| 67 | +Detection results are visualized using the `draw_hand` function. Keypoint data can be accessed via `obj.points`, providing `4 + 21` points: |
| 68 | +* The first 4 points are the bounding box corners in clockwise order: `topleft_x, topleft_y, topright_x, topright_y, bottomright_x, bottomright_y, bottomleft_x, bottomleft_y`. Values may be negative. |
| 69 | +* The remaining 21 points are keypoints in the format: `x0, y0, z0, x1, y1, z1, ..., x20, y20, z20`, where `z` represents depth relative to the palm and may also be negative. |
| 70 | + |
| 71 | +Additionally, `obj.x, y, w, h, angle` attributes provide the bounding box and rotation details. |
| 72 | + |
| 73 | +**Precision Optimization**: The `nn.HandLandmarks` class uses an `int8` quantized model by default for faster detection. For higher precision, switch to the `hand_landmarks_bf16.mud` model. |
| 74 | +**Relative Landmark Coordinates**: By setting the `landmarks_rel` parameter to `True`, the function will output the 21 keypoints as relative coordinates to the top-left corner of the hand's bounding box. In this case, the last `21x2` values in `obj.points` are arranged as `x0, y0, x1, y1, ..., x20, y20`. |
| 75 | + |
| 76 | +## Advanced: Gesture Recognition Based on Keypoint Detection |
| 77 | + |
| 78 | +### Example: Rock-Paper-Scissors Detection |
| 79 | +Two approaches: |
| 80 | +1. **Traditional Method**: Use code to classify gestures based on keypoint analysis. |
| 81 | +2. **AI Model-Based Method**: Train a classification model. |
| 82 | + |
| 83 | +**Approach 2**: |
| 84 | +This involves using the 21 keypoints as input for a classification model. Without image background interference, fewer data samples are needed for effective training. |
| 85 | + |
| 86 | +Steps: |
| 87 | +1. Define gesture categories (e.g., rock, paper, scissors). |
| 88 | +2. Record keypoint data upon user input. |
| 89 | +3. Normalize keypoint coordinates to relative values (0 to object width `obj.w`) using `landmarks_rel` parameter as described above. |
| 90 | +4. Collect data for each category. |
| 91 | +5. Train a classification model (e.g., using MobileNetV2 in PyTorch). |
| 92 | +6. Convert the trained model to MaixCAM-supported format. |
| 93 | + |
| 94 | +This approach requires knowledge of training and quantizing classification models. |
| 95 | + |
| 96 | +## Simplified Model Training Alternative |
| 97 | +For users unfamiliar with PyTorch: |
| 98 | +1. Generate an image from the 21 keypoints (customize visualization). |
| 99 | +2. Upload the images to [MaixHub.com](https://maixhub.com) for model training. |
| 100 | +3. Use the trained model in MaixPy for classification. |
| 101 | + |
| 102 | +## Complex Action Recognition |
| 103 | +For actions requiring time-series analysis (e.g., circular motions): |
| 104 | +* Store keypoint history in a queue for temporal analysis. |
| 105 | +* Input historical sequences into a classification model for time-series gesture recognition. |
| 106 | +* Alternatively, generate a single image from historical data and classify it. |
| 107 | + |
| 108 | +These methods allow advanced gesture and action recognition leveraging MaixPy's integrated tools. |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +This version includes all details, including the explanation for `landmarks_rel`. |
| 113 | + |
0 commit comments