希望增加tok保存空格的选项，以便分词后还原文本

**Describe the feature and the current behavior/state.**

文本的空格（全形和半形）会在tok舍弃

**Will this change the current api? How?**

不知道

**Who will benefit with this feature?**

使用简繁转换的人

**Are you willing to contribute it (Yes/No):**

力有不逮

**System information**
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Arch Linux
- Python version: 3.10.9
- HanLP version: 2.1.0b45，用`pip install hanlp`安装

**Any other info**


我主要是想用hanlp来进行文本简繁转换

因为opencc的简繁转换有时会出现问题（例如`只`和`隻`的转换）
在其github [#224 (comment)](https://github.com/BYVoid/OpenCC/issues/224#issuecomment-283668276)的讨论中，看到有人使用HanLP分词再丢给opencc
所以试了一整天，感觉不错
但是因为tok未能保存空格以文本未能成功还原

例子

```python
import hanlp
tok = hanlp.load(hanlp.pretrained.tok.COARSE_ELECTRA_SMALL_ZH)
print(tok(['2021年HanLPv2.1为生产环境带来次世代最先进的多语种Neuro-linguistic programming技术。', '阿婆主来到北京立方庭参观自然语义科技公司。']))
```
输出为：
```python
[['2021年', 'HanLPv2.1', '为', '生产', '环境', '带来', '次世代', '最', '先进', '的', '多', '语种', 'Neuro-linguistic', 'programming', '技术', '。'], ['阿婆', '主', '来到', '北京立方庭', '参观', '自然语义科技公司', '。']]
```

`Neuro-linguistic programming` 两个词中的空格消失了
把这段输出丢给opencc再还原后
就会变成`Neuro-linguisticprogramming`

因为我编程能力极度有限
现在我只是使用python读取txt档
再像上面那样python的hanlp的tok分词
再使用json.dumps掉进terminal
在terminal用`opencc`进行简繁转换
再使用`jq`,`sed`等工具还原文本

或者有没有什么更有效的分词简繁转换方法？
谢谢！


* [x] I've carefully completed this form.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

希望增加tok保存空格的选项，以便分词后还原文本 #1802

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

希望增加tok保存空格的选项，以便分词后还原文本 #1802

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions