Skip to content

Releases: ccprocessor/llm-webkit-mirror

v3.2.0-released

01 Aug 03:23
b1bc533
Compare
Choose a tag to compare

What's Changed

  • release 3.1.2 by @dt-yy in #377
  • Main by @dt-yy in #378
  • Em html by @yogacc33 in #379
  • change the version number to single quotes by @yogacc33 in #382
  • feat: add pre_data_json unit test by @yogacc33 in #383
  • fix: update html_layout_cosin.py & test_html_layout_cosin.py, add similarity func by @renpengli01 in #381
  • docs: update readme.md by @yogacc33 in #384
  • feat: code extract on googlesource.com by @NgZiming in #386
  • docs: qwen-72b-instruct deploy doc by @drunkpig in #387
  • layout batch parser by @dt-yy in #388
  • feat: 代表HTML网页选中、HTML精简 by @LollipopsAndWine in #389
  • : add tag_mapping.py codes, output element dict of main html dom tree by @papayalove in #385
  • : fix tag_mapping.py codes, fix target_list output, make it more accurate by @papayalove in #391
  • fix dom推广异常 by @dt-yy in #394
  • 修复单测case by @dt-yy in #396
  • Dev element dict improvement by @papayalove in #397
  • fix: 优化精简v1 by @LollipopsAndWine in #401
  • feat: add ccstore pipeline by @e06084 in #398
  • fix wiki web not complete by @dt-yy in #405
  • feat: compress_and_decompress_str func standard_utils.py & test_standard_utils.py & fix: html_layout_cosin.py & test_html_layout_cosin.py .2f by @renpengli01 in #404
  • feat: use llm select html main content node by @drunkpig in #406
  • feat: select html content node by LLM by @drunkpig in #407
  • 修改推广的字段 by @dt-yy in #409
  • : add template html main tree extract success verification by tree structure similarity between template main html and original html. by @papayalove in #410
  • : add main html extract success verification by tree structure similarity between template main html and main html. by @papayalove in #411
  • : add raw tag html xpath info to element dict by @papayalove in #412
  • feat: Sub/sup retains the original / tag format and does no… by @yogacc33 in #413
  • feat: mv cc_store code to jupyter dir by @e06084 in #408
  • : fix same layer definition in layout_batch_parser.py by @papayalove in #416
  • fix: img math display mode by @e06084 in #414
  • feat: add jupyter package in lint workflow. by @yogacc33 in #419
  • fead: add layout_index_webkit.ipynb & nbconvert==7.16.6,notebook==7.4.2,jupyter==1.1.1 & fix: pre commit achieved clear all output data of jupyter file by @renpengli01 in #420
  • html-cls m4 by @darkrush in #422
  • some change about timeout by @ddfinshes in #425
  • 识别paragraph部分bug修复 by @ddfinshes in #427
  • : add dynamic id match in layout_batch_parser.py, enabled by switch variable DYNAMIC_ID_ENABLE, and add TYPICAL_DICT_HTML output by tag_map by @papayalove in #428
  • update: optimize cc_domain_index_gen and add en readme by @e06084 in #429
  • docs: update domain cluster readme by @e06084 in #430
  • feat: add cluster layout series jupyter & fix pre commit by @renpengli01 in #431
  • : add dynamic classid match in layout_batch_parser.py, enabled by switch variable DYNAMIC_ID_ENABLE, and add TYPICAL_DICT_HTML output by tag_map by @papayalove in #432
  • fix: 根据模型评测调整精简 by @LollipopsAndWine in #434
  • fix: use stream read in cc domain index generation by @e06084 in #436
  • feat: 精简属性只保留图片src和元素class、id by @LollipopsAndWine in #438
  • feat:清理元素属性,保留图片的有效src(排除base64)、alt,以及所有元素的class和id by @LollipopsAndWine in #440
  • fix: get_feature add is_ignore_tag & similarity by html_layout_cosin.py by @renpengli01 in #442
  • feat: 精简控制是否获取XPATH by @LollipopsAndWine in #443
  • : add dynamic classid match switch by @papayalove in #445
  • feat: add jupyter files: cc dedup by hash html & add readme cc dedup by @renpengli01 in #447
  • 修复bug 1:部分输入丢失命名空间,无法匹配xsl模板; 2:部分公式段落划分错误; 3: 形如如 \text{...}的公式内容,花括号前被错误添加\left和\right by @1041206149 in #435
  • : fix tag map get_feature None error by @papayalove in #449
  • fix: jupyter:combines a four-step clustering procedure into a single … by @renpengli01 in #448
  • : add parse_single in MapItemToHtmlTagsParser for single html extraction by @papayalove in #452
  • fix: layout cluster dynamic properties & unit test by @renpengli01 in #453
  • feat: use http url as markdown image path by @drunkpig in #454
  • : fix parse_single in MapItemToHtmlTagsParser for single html extraction by @papayalove in #459
  • 增加元素识别和抽取magic-html的接口 by @dt-yy in #457
  • update readme by @dt-yy in #461
  • 识别部分bug修复 by @ddfinshes in #456
  • feat: 自定义标签'marked-tail', 'marked-text'配置为行内标签 by @LollipopsAndWine in #462
  • fix pylint by @dt-yy in #463
  • 新增知乎公式提取 by @1041206149 in #451
  • update readme by @dt-yy in #465
  • 修改语言检测文档,涉政模型文档,敏感词代码及文档 by @darkrush in #426
  • fix: 重命名自定义标签名称 by @LollipopsAndWine in #468
  • bench: fix MagicHTMLFIleFormatorExtractor by @e06084 in #471
  • fix: layout cluster & unit test by @renpengli01 in #476
  • mathjax渲染器方案优化 by @1041206149 in #470
  • : add some inline tag noise, make the extraction more robust and fixed id classid strip() bug by @papayalove in #477
  • feat: cc_dedup_fir add exception handle by @e06084 in #478
  • mathjax渲染器方案逻辑修改 by @1041206149 in #480
  • : fix image loss problem in new tag and modified the dynamic_classid_similarity_threshold by @papayalove in #482
  • feat: add code detect fasttext model by @yogacc33 in #483
  • feat: add math detector model by @yogacc33 in #484
  • Feature/math model by @yogacc33 in #485
  • add dedup by @dt-yy in #489
  • : add extract main html by model response to README.md by @papayalove in #488
  • fix: 修复多语种拼接规则 by @LollipopsAndWine in #490
  • table 结构丢失,只保留caption问题修复 by @dt-yy in #493
  • feat: add get_cc_select_html by @e06084 in #496
  • feat: noclip管线新增预处理:删除表单交互式元素 by @LollipopsAndWine in #495
  • : fix table tag integrity by @papayalove in #497
  • mathjax...
Read more

v3.1.2-released

15 Apr 15:07
Compare
Choose a tag to compare

What's Changed

Full Changelog: v3.1.0-released...v3.1.2-released

v3.1.1-released

31 Mar 12:49
2450ea9
Compare
Choose a tag to compare

What's Changed

Full Changelog: v3.1.0-released...v3.1.1-released

v3.1.0-released

20 Mar 14:56
c44a490
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: llm-web-kit==3.0.1released...v3.1.0-released

llm-web-kit==3.0.1released

21 Feb 14:36
852cf7f
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: llm-web-kit==3.0.0...llm-web-kit==3.0.1released

llm-web-kit==3.0.0

13 Jan 07:58
9c2e619
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: https://github.com/ccprocessor/llm-webkit-mirror/commits/llm-web-kit==3.0.0