Skip to content

Commit df92a0e

Browse files
committed
[core] Support language dialects (#5438)
Merge pull request #5438 from Monits:lang-dialects
2 parents 95c3c12 + 63bc45a commit df92a0e

29 files changed

+967
-52
lines changed

docs/_data/sidebars/pmd_sidebar.yml

+3
Original file line numberDiff line numberDiff line change
@@ -595,6 +595,9 @@ entries:
595595
- title: Rule Guidelines
596596
url: /pmd_devdocs_major_rule_guidelines.html
597597
output: web, pdf
598+
- title: Adding a new dialect
599+
url: /pmd_devdocs_major_adding_dialect.html
600+
output: web, pdf
598601
- title: Adding a new language (JavaCC)
599602
url: /pmd_devdocs_major_adding_new_language_javacc.html
600603
output: web, pdf
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
---
2+
title: Adding PMD support for a new dialect for an already existing language
3+
short_title: Adding a new dialect
4+
tags: [devdocs, extending, experimental]
5+
summary: "How to add a new dialect."
6+
last_updated: April 2025 (7.13.0)
7+
sidebar: pmd_sidebar
8+
permalink: pmd_devdocs_major_adding_dialect.html
9+
folder: pmd/devdocs
10+
---
11+
12+
{% include callout.html type="info" content="
13+
14+
**What is a dialect?**<br><br>
15+
16+
A dialect is a particular form of another supported language. For example, an XSLT is a particular form of an XML.
17+
Even though the dialect has its own semantics and uses, the contents are still readable by any tool capable of understanding the base language.<br><br>
18+
19+
In PMD, a dialect allows to set up completely custom rules, XPath functions, properties and metrics for these files;
20+
while retaining the full support of the underlying language. That means:<br><br>
21+
22+
- All rules applicable to the base language are automatically applicable to all files processed as a dialect.<br>
23+
- All XPath functions existing in the base language are available when creating new rules.<br>
24+
- All metrics supported by the base language are available when creating new rules.<br>
25+
- All properties (ie: support to suppress literals in CPD) supported by the base language are supported by the dialect.<br>
26+
27+
" %}
28+
29+
## Steps
30+
31+
### 1. Create a dialect module
32+
* Dialects usually reside in the same module of the base language they leverage; but can technically live standalone in a separate module if needed.
33+
* Create your subclass of `net.sourceforge.pmd.lang.impl.SimpleDialectLanguageModuleBase`, see XSL as an example: [`XslDialectModule`](https://github.com/pmd/pmd/blob/main/pmd-xml/src/main/java/net/sourceforge/pmd/lang/xml/xsl/XslDialectModule.java).
34+
* For a minimal implementation, it just needs a constructor calling super() with the required metadata.
35+
Dialect metadata is created through the builder obtained `LanguageMetadata.withId`
36+
* Define the human readable name of the language by calling `name`
37+
* Define all extensions PMD should consider when applying this dialect by calling `extensions`
38+
* Add for each version of your language a call to `addVersion` in your language module’s constructor.
39+
Use `addDefaultVersion` for defining the default version.
40+
* Finalize the metadata construction by calling `asDialectOf` to reference the base language by id.
41+
* Create the service registration via the text file `src/main/resources/META-INF/services/net.sourceforge.pmd.lang.Language`.
42+
Add your fully qualified class name as a single line into it.
43+
44+
### 2. Create a language handler (Optional)
45+
* This step is only required if you either want the dialect to:
46+
* expose additional XPath functions
47+
* compute additional metrics
48+
* customize violation suppress logic
49+
* define {% jdoc core::reporting.ViolationDecorator %}s, to add additional dialect specific information to the
50+
created violations. The [Java language module](pmd_languages_java.html#violation-decorators) uses this to
51+
provide the method name or class name, where the violation occurred.
52+
* To do this, create a new class extending from [`BasePmdDialectLanguageVersionHandler`](https://github.com/pmd/pmd/blob/main/pmd-core/src/main/java/net/sourceforge/pmd/lang/impl/BasePmdDialectLanguageVersionHandler.java), and override the getter corresponding to what you want to extend.
53+
You don't need to worry about including anything from the base language, only include your extensions. PMD will take care of merging everything together.
54+
* Ensure to pass a new instance of your dialect handler as a second parameter in your dialect module (see Step 1) when calling `super`.
55+
56+
### 3. Create rules
57+
* Creating rules is already pretty well documented in PMD - and it’s no different for a new dialect.
58+
* PMD supports 2 types of rules, through visitors or XPath.
59+
* To add a visitor rule:
60+
* You need to extend the abstract rule provided by the base language, for instance in XML dialects, you would extend [`AbstractXmlRule`](https://github.com/pmd/pmd/blob/main/pmd-xml/src/main/java/net/sourceforge/pmd/lang/xml/rule/AbstractXmlRule.java).
61+
Note, that all rule classes should be suffixed with `Rule` and should be placed
62+
in a package the corresponds to their dialect and category.
63+
* To add an XPath rule you can follow our guide [Writing XPath Rules](pmd_userdocs_extending_writing_xpath_rules.html).
64+
* When creating the category ruleset XML file, the XML can reference build properties that are replaced
65+
during the build. This is used for the `externalInfoUrl` attribute of a rule. E.g. we use `${pmd.website.baseurl}`
66+
to point to the correct webpage (depending on the PMD version).
67+
68+
### 4. Test the rules
69+
* Testing rules is described in depth in [Testing your rules](pmd_userdocs_extending_testing.html).
70+
* Each rule has its own test class: Create a test class for your rule extending `PmdRuleTst`
71+
*(see
72+
[`UnavailableFunctionTest`](https://github.com/pmd/pmd/blob/main/pmd-swift/src/test/java/net/sourceforge/pmd/lang/swift/rule/bestpractices/UnavailableFunctionTest.java)
73+
for example)*
74+
* Create a category rule set for your dialect *(see
75+
[`category/swift/bestpractices.xml`](https://github.com/pmd/pmd/blob/main/pmd-swift/src/main/resources/category/swift/bestpractices.xml)
76+
for example)*
77+
* Place the test XML file with the test cases in the correct location
78+
* When executing the test class
79+
* this triggers the unit test to read the corresponding XML file with the rule test data
80+
*(see
81+
[`UnavailableFunction.xml`](https://github.com/pmd/pmd/blob/main/pmd-swift/src/test/resources/net/sourceforge/pmd/lang/swift/rule/bestpractices/xml/UnavailableFunction.xml)
82+
for example)*
83+
* This test XML file contains sample pieces of code which should trigger a specified number of
84+
violations of this rule. The unit test will execute the rule on this piece of code, and verify
85+
that the number of violations matches.
86+
* To verify the validity of all the created rulesets, create a subclass of `AbstractRuleSetFactoryTest`
87+
(*see `RuleSetFactoryTest` in pmd-swift for example)*.
88+
This will load all rulesets and verify, that all required attributes are provided.
89+
90+
*Note:* You'll need to add your ruleset to `categories.properties`, so that it can be found.
91+
92+
### 5. Create documentation page
93+
Finishing up your new dialect by adding a page in the documentation. Create a new markdown file
94+
`<langId>.md` in `docs/pages/pmd/languages/`. This file should have the following frontmatter:
95+
96+
```
97+
---
98+
title: <Language Name>
99+
permalink: pmd_languages_<langId>.html
100+
last_updated: <Month> <Year> (<PMD Version>)
101+
tags: [languages, PmdCapableLanguage, CpdCapableLanguage]
102+
---
103+
```
104+
105+
On this page, language specifics can be documented, e.g. when the language was first supported by PMD.
106+
There is also the following Jekyll Include, that creates summary box for the language:
107+
108+
```
109+
{% raw %}
110+
{% include language_info.html name='<Language Name>' id='<langId>' implementation='<langId>::lang.<langId>.<langId>LanguageModule' supports_cpd=true supports_pmd=true since='<PMD Version>' %}
111+
{% endraw %}
112+
```
113+

docs/pages/pmd/devdocs/major_contributions/adding_a_new_antlr_based_language.md

+10
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,16 @@ permalink: pmd_devdocs_major_adding_new_language_antlr.html
99
folder: pmd/devdocs
1010
---
1111

12+
{% include callout.html type="info" content="
13+
14+
**Do you really need a new language?**<br><br>
15+
16+
This document describes how to add a new full-fledged language, with it's own grammar and parser.
17+
If what you are trying to support is “a specific type” of files for a grammar that already exists
18+
(ie: a specific type of XML or HTML file) you may want to consider [creating a **dialect**](pmd_devdocs_major_adding_dialect.html) instead.
19+
20+
" %}
21+
1222
{% include callout.html type="warning" content="
1323

1424
**Before you start...**<br><br>

docs/pages/pmd/devdocs/major_contributions/adding_a_new_javacc_based_language.md

+10
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,16 @@ permalink: pmd_devdocs_major_adding_new_language_javacc.html
99
folder: pmd/devdocs
1010
---
1111

12+
{% include callout.html type="info" content="
13+
14+
**Do you really need a new language?**<br><br>
15+
16+
This document describes how to add a new full-fledged language, with it's own grammar and parser.
17+
If what you are trying to support is “a specific type” of files for a grammar that already exists
18+
(ie: a specific type of XML or HTML file) you may want to consider [creating a **dialect**](pmd_devdocs_major_adding_dialect.html) instead.
19+
20+
" %}
21+
1222
{% include callout.html type="warning" content="
1323

1424
**Before you start...**<br><br>

docs/pages/release_notes.md

+38
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@ permalink: pmd_release_notes.html
44
keywords: changelog, release notes
55
---
66

7+
{% if is_release_notes_processor %}
8+
{% capture baseurl %}https://docs.pmd-code.org/pmd-doc-{{ site.pmd.version }}/{% endcapture %}
9+
{% else %}
10+
{% assign baseurl = "" %}
11+
{% endif %}
12+
713
## {{ site.pmd.date | date: "%d-%B-%Y" }} - {{ site.pmd.version }}
814

915
The PMD team is pleased to announce PMD {{ site.pmd.version }}.
@@ -27,6 +33,19 @@ docker run --rm --tty -v $PWD:/src pmdcode/pmd:latest check -d . -R rulesets/jav
2733

2834
More information is available at <https://github.com/pmd/docker>.
2935

36+
#### Experimental support for language dialects
37+
38+
A dialect is a particular form of another supported language. For example, an XSLT is a particular
39+
form of an XML. Even though the dialect has its own semantics and uses, the contents are still readable
40+
by any tool capable of understanding the base language.
41+
42+
In PMD, a dialect allows to set up completely custom rules, XPath functions, properties and metrics
43+
for these files; while retaining the full support of the underlying base language including
44+
already existing rules and XPath functions.
45+
46+
See [[core] Support language dialects #5438](https://github.com/pmd/pmd/pull/5438) and
47+
[Adding a new dialect]({{ baseurl }}pmd_devdocs_major_adding_dialect.html) for more information.
48+
3049
#### ✨ New Rules
3150

3251
* The new Apex rule {% rule apex/errorprone/TypeShadowsBuiltInNamespace %} finds Apex classes, enums, and interfaces
@@ -35,6 +54,7 @@ More information is available at <https://github.com/pmd/docker>.
3554

3655
### 🐛 Fixed Issues
3756
* core
57+
* [#5438](https://github.com/pmd/pmd/issues/5438): \[core] Support language dialects
3858
* [#5448](https://github.com/pmd/pmd/issues/5448): Maintain a public PMD docker image
3959
* [#5525](https://github.com/pmd/pmd/issues/5525): \[core] Add rule priority as level to Sarif report
4060
* [#5623](https://github.com/pmd/pmd/issues/5623): \[dist] Make pmd launch script compatible with /bin/sh
@@ -51,6 +71,24 @@ More information is available at <https://github.com/pmd/docker>.
5171

5272
### 🚨 API Changes
5373

74+
#### Deprecations
75+
* {%jdoc !!xml::lang.xml.pom.PomLanguageModule %} is deprecated. POM is now a dialect of XML.
76+
Use {%jdoc xml::lang.xml.pom.PomDialectModule %} instead.
77+
* {%jdoc !!xml::lang.xml.wsdl.WsdlLanguageModule %} is deprecated. WSDL is now a dialect of XML.
78+
Use {%jdoc xml::lang.xml.wsdl.WsdlDialectModule %} instead.
79+
* {%jdoc !!xml::lang.xml.xsl.XslLanguageModule %} is deprecated. XSL is now a dialect of XML.
80+
Use {%jdoc xml::lang.xml.xsl.XslDialectModule %} instead.
81+
82+
#### Experimental API
83+
* The core API around support for language dialects:
84+
* {%jdoc !!core::lang.Language#getBaseLanguageId() %}
85+
* {%jdoc !!core::lang.Language#isDialectOf(core::lang.Language) %}
86+
* {%jdoc !!core::lang.LanguageModuleBase#<init>(core::lang.LanguageModuleBase.DialectLanguageMetadata) %}
87+
* {%jdoc !!core::lang.LanguageModuleBase#asDialectOf(java.lang.String) %}
88+
* {%jdoc core::lang.LanguageModuleBase.DialectLanguageMetadata %}
89+
* {%jdoc core::lang.impl.BasePmdDialectLanguageVersionHandler %}
90+
* {%jdoc core::lang.impl.SimpleDialectLanguageModuleBase %}
91+
5492
### ✨ Merged pull requests
5593
<!-- content will be automatically generated, see /do-release.sh -->
5694
* [#5450](https://github.com/pmd/pmd/pull/5450): Fix #3184: \[apex] New Rule: TypeShadowsBuiltInNamespace - [Mitch Spano](https://github.com/mitchspano) (@mitchspano)

pmd-core/src/main/java/net/sourceforge/pmd/PmdAnalysis.java

+14
Original file line numberDiff line numberDiff line change
@@ -537,6 +537,20 @@ private Set<Language> getApplicableLanguages(boolean quiet) {
537537
}
538538
}
539539
} while (changed);
540+
541+
// include all available dialects of applicable languages - ie: if we have XML rules, all XML dialects are applicable
542+
do {
543+
changed = false;
544+
for (Language lang : reg) {
545+
if (lang.getBaseLanguageId() != null) {
546+
Language baseLang = reg.getLanguageById(lang.getBaseLanguageId());
547+
if (baseLang != null && languages.contains(baseLang)) {
548+
changed |= languages.add(lang);
549+
}
550+
}
551+
}
552+
} while (changed);
553+
540554
return languages;
541555
}
542556

pmd-core/src/main/java/net/sourceforge/pmd/lang/Language.java

+31
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,9 @@
1111
import org.checkerframework.checker.nullness.qual.NonNull;
1212
import org.checkerframework.checker.nullness.qual.Nullable;
1313

14+
import net.sourceforge.pmd.annotation.Experimental;
1415
import net.sourceforge.pmd.cpd.CpdCapableLanguage;
16+
import net.sourceforge.pmd.util.AssertionUtil;
1517

1618
/**
1719
* Represents a language module, and provides access to language-specific
@@ -62,6 +64,35 @@ public interface Language extends Comparable<Language> {
6264
*/
6365
String getId();
6466

67+
/**
68+
* If this is a dialect of another language, returns the base language.
69+
* Dialects are for example different flavors of XML. Dialects must share
70+
* the same AST as their base language. This makes it so that rules written
71+
* for the base language can be applied files of all dialects uniformly.
72+
* @experimental Since 7.13.0. See <a href="https://github.com/pmd/pmd/pull/5438">[core] Support language dialects #5438</a>.
73+
*/
74+
@Experimental
75+
default @Nullable String getBaseLanguageId() {
76+
return null;
77+
}
78+
79+
/**
80+
* Return true if this language is a dialect of the given language.
81+
*
82+
* @param language A language (not null)
83+
* @experimental Since 7.13.0. See <a href="https://github.com/pmd/pmd/pull/5438">[core] Support language dialects #5438</a>.
84+
*/
85+
@Experimental
86+
@SuppressWarnings("PMD.SimplifyBooleanReturns")
87+
default boolean isDialectOf(Language language) {
88+
AssertionUtil.requireParamNotNull("language", language);
89+
String base = getBaseLanguageId();
90+
if (base == null) {
91+
return false;
92+
}
93+
return base.equals(language.getId());
94+
}
95+
6596
/**
6697
* Returns the list of file extensions associated with this language.
6798
* This list is unmodifiable. Extensions do not have a '.' prefix.

0 commit comments

Comments
 (0)