-
Notifications
You must be signed in to change notification settings - Fork 1
xxx.dtd a template #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
how one.dtd was constructed.While similar in some ways to the relation between the make_xml.py template (see #5) and the individual versions, this relation is somewhat different. In the case of make_xml.py template, the template is used to generate, for each dictionary xxx, a version of make_xml.py for that dictionary that is functionally the same as the prior distinct version; The xxx.dtd generated by one.dtd is also functionally similar to the previous distinct xxx.dtd, in that the xml file xxx.xml is judged valid by both. However, the xxx.dtd generated by one.dtd is quite different from the previous distinct xxx.dtd.
|
Must one.dtd be a template?For dictionary xxx, the xml root of xxx.xml is xxx. In other words, the xml structure of xxx.xml is
So, at least the differences in root elements dictates that one.dtd must use xxx as a template variable. Otherwise, there are only two places where template variables are used.
|
removal of template logic for APWe remove the template distinction for AP dictionary as follows.
|
Suggestions for improvementWith all the dtds represented in one.dtd, we can now examine one.dtd with an eye towards simplification.
Some additional tools needed.In investigating such simplifications as above, some additional software tools will probably be
|
What tool would you use to count how many of each 20 are there? |
|
Please advise, I'd like to work on this bash tool. Thank you! |
I've added a 'v02/utilities/ folder to this repository, and put check_xml_tags.py there. |
Thank you! I see it there. |
Note on .gitignoreThere are often occasions where I want to do some kind of analysis; an example might be to try |
We might benefit from another branch for this repo. |
A one-line variation of check_xml_tags.py does the trick. Change line 10 to: Call the new program, for example, v02/utilities/temp.py. And run it with Then temp.txt contains the list of 20, with counts. For example |
My understanding of git does not yet extend to how to make use of branches. If you have something specific in mind, go ahead and give it a try. Let's take it in baby steps until we all |
Sure. I think we can benifit from Yevgeniy's experience. |
I am from GitLab world, but GitHub should have it also, as that's part of regular Git functionality. |
That's probably the safest way, in case the "rexgen" tool has differences in regex engine it uses, as some metasymbols might be interpreted slightly different depending on parser (there's entire book on those regex engines subtleties on Safari, apologies for sidetrack). I hope that the regex used in DTD is the same parser python uses. |
This is what I see parsing all the dictionaries:
We need to unify Greek/greek (* vs pwg capitalized) and Russian/russian (pw vs pwg capitalized), besides that everything else seems unique.
|
I tried to commit the shell script, but it seems I don't have that permission:
Basically, the results above could have been done using one shell script:
which would give the same results, so not sure if you need a script in the repo, as this seems to be one-time search:
|
No, actually the scanned images are NOT part of any repository. Currently, the logic involved in displaying scanned images (this logic is part of csl-websanlexicon) The images are also available from an AWS-S3 bucket, but using that source of images is not It is precisely for size reasons that the scanned images are not in a repository -- I think their If we want to give Ubuntu (and other local) installations the option to have local copies of the If you want to work on this, I can provide some further details. |
The check_xml_tags.py program actually is not using a python xml parser. It is just reading the xml file as aside on xml validatorsOn local XAMPP system, it is hard to get the xmllint xml-validator -- xmllint is used in the redo_xml.sh script to check that a given dictionary validates according to its dtd. |
Absolutely, I would like to work on this, as in case Cologne server is not available, or VM runs offline, we need an option to stuff VM with local images. I suspect some discrepancies between digital version and pictures are inevitable for this size of project, so it's important to have picture alongside with the digital version of dictionary. I don't know if GitHub charges for 50-60GB of pictures, which would be accessing read-only, and if that's cheaper comparing to AWS-S3 bucket. Please advise what to take a look at (I guess image fetching is part of php), so we can give that option to the standalone builds. Thank you! |
In the previous revision of csl-pywork, the dictionary dtds (xxx.dtd) were in 'distinctfiles'. That is, when reconstructing a 2020 dictionary, csl-pywork used a separate version of the xxx.dtd program for each dictionary. Now, csl-pywork uses one one.dtd template to create the different versions.
This is an improvement, because now we can see all the variations in one place.
The text was updated successfully, but these errors were encountered: