Skip to content

Commit 62a16f3

Browse files
SukkaWquantizor
andauthored
feat: allow composer to adjust sanitization (#564) (#579)
* feat: allow disable sanitization (#564) * test: #564 * chore: add changeset * chore: restore prior whitespace * refactor: adjust sanitizer to provide more data to the composer * refactor: DX tweaks * chore: adjust size limit will golf this down later * chore: tweak changeset --------- Co-authored-by: Evan Jacobs <[email protected]>
1 parent 553a175 commit 62a16f3

File tree

5 files changed

+145
-27
lines changed

5 files changed

+145
-27
lines changed

.changeset/tricky-poems-collect.md

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
'markdown-to-jsx': minor
3+
---
4+
5+
Allow modifying HTML attribute sanitization when `options.sanitizer` is passed by the composer.
6+
7+
By default a lightweight URL sanitizer function is provided to avoid common attack vectors that might be placed into the `href` of an anchor tag, for example. The sanitizer receives the input, the HTML tag being targeted, and the attribute name. The original function is available as a library export called `sanitizer`.
8+
9+
This can be overridden and replaced with a custom sanitizer if desired via `options.sanitizer`:
10+
11+
```jsx
12+
// sanitizer in this situation would receive:
13+
// ('javascript:alert("foo")', 'a', 'href')
14+
15+
;<Markdown options={{ sanitizer: (value, tag, attribute) => value }}>
16+
{`[foo](javascript:alert("foo"))`}
17+
</Markdown>
18+
19+
// or
20+
21+
compiler('[foo](javascript:alert("foo"))', {
22+
sanitizer: (value, tag, attribute) => value,
23+
})
24+
```

README.md

+26-2
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ The most lightweight, customizable React markdown component.
1919
- [options.createElement - Custom React.createElement behavior](#optionscreateelement---custom-reactcreateelement-behavior)
2020
- [options.enforceAtxHeadings](#optionsenforceatxheadings)
2121
- [options.renderRule](#optionsrenderrule)
22+
- [options.sanitizer](#optionssanitizer)
2223
- [options.slugify](#optionsslugify)
2324
- [options.namedCodesToUnicode](#optionsnamedcodestounicode)
2425
- [options.disableParsingRawHTML](#optionsdisableparsingrawhtml)
@@ -435,21 +436,44 @@ function App() {
435436
}
436437
````
437438

439+
#### options.sanitizer
440+
441+
By default a lightweight URL sanitizer function is provided to avoid common attack vectors that might be placed into the `href` of an anchor tag, for example. The sanitizer receives the input, the HTML tag being targeted, and the attribute name. The original function is available as a library export called `sanitizer`.
442+
443+
This can be overridden and replaced with a custom sanitizer if desired via `options.sanitizer`:
444+
445+
```jsx
446+
// sanitizer in this situation would receive:
447+
// ('javascript:alert("foo")', 'a', 'href')
448+
449+
;<Markdown options={{ sanitizer: (value, tag, attribute) => value }}>
450+
{`[foo](javascript:alert("foo"))`}
451+
</Markdown>
452+
453+
// or
454+
455+
compiler('[foo](javascript:alert("foo"))', {
456+
sanitizer: (value, tag, attribute) => value,
457+
})
458+
```
459+
438460
#### options.slugify
439461

440462
By default, a [lightweight deburring function](https://github.com/probablyup/markdown-to-jsx/blob/bc2f57412332dc670f066320c0f38d0252e0f057/index.js#L261-L275) is used to generate an HTML id from headings. You can override this by passing a function to `options.slugify`. This is helpful when you are using non-alphanumeric characters (e.g. Chinese or Japanese characters) in headings. For example:
441463

442464
```jsx
443-
;<Markdown options={{ slugify: str => str }}># 中文</Markdown>
465+
<Markdown options={{ slugify: str => str }}># 中文</Markdown>
444466

445467
// or
446468

447469
compiler('# 中文', { slugify: str => str })
448470

449471
// renders:
450-
;<h1 id="中文">中文</h1>
472+
<h1 id="中文">中文</h1>
451473
```
452474

475+
The original function is available as a library export called `slugify`.
476+
453477
#### options.namedCodesToUnicode
454478

455479
By default only a couple of named html codes are converted to unicode characters:

index.compiler.spec.tsx

+41-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
import { compiler, RuleType } from './index'
1+
import { compiler, sanitizer, RuleType } from './index'
22
import * as React from 'react'
33
import * as ReactDOM from 'react-dom'
44
import * as fs from 'fs'
@@ -1180,6 +1180,46 @@ describe('links', () => {
11801180
`)
11811181
})
11821182

1183+
it('should not sanitize markdown when explicitly disabled', () => {
1184+
jest.spyOn(console, 'warn').mockImplementation(() => {})
1185+
jest.spyOn(console, 'error').mockImplementation(() => {})
1186+
1187+
render(compiler('[foo](javascript:doSomethingBad)', { sanitizer: x => x }))
1188+
1189+
expect(root.innerHTML).toMatchInlineSnapshot(`
1190+
<a href="javascript:doSomethingBad">
1191+
foo
1192+
</a>
1193+
`)
1194+
1195+
expect(console.warn).not.toHaveBeenCalled()
1196+
})
1197+
1198+
it('tag and attribute are provided to allow for conditional override', () => {
1199+
jest.spyOn(console, 'warn').mockImplementation(() => {})
1200+
jest.spyOn(console, 'error').mockImplementation(() => {})
1201+
1202+
render(
1203+
compiler(
1204+
'[foo](javascript:doSomethingBad)\n![foo](javascript:doSomethingBad)',
1205+
{
1206+
sanitizer: (value, tag) => (tag === 'a' ? value : sanitizer(value)),
1207+
}
1208+
)
1209+
)
1210+
1211+
expect(root.innerHTML).toMatchInlineSnapshot(`
1212+
<p>
1213+
<a href="javascript:doSomethingBad">
1214+
foo
1215+
</a>
1216+
<img alt="foo">
1217+
</p>
1218+
`)
1219+
1220+
expect(console.warn).toHaveBeenCalledTimes(1)
1221+
})
1222+
11831223
it('should sanitize markdown links containing JS expressions', () => {
11841224
jest.spyOn(console, 'warn').mockImplementation(() => {})
11851225
jest.spyOn(console, 'error').mockImplementation(() => {})

index.tsx

+52-22
Original file line numberDiff line numberDiff line change
@@ -731,8 +731,10 @@ function normalizeAttributeKey(key) {
731731
}
732732

733733
function attributeValueToJSXPropValue(
734+
tag: MarkdownToJSX.HTMLTags,
734735
key: keyof React.AllHTMLAttributes<Element>,
735-
value: string
736+
value: string,
737+
sanitizeUrlFn: MarkdownToJSX.Options['sanitizer']
736738
): any {
737739
if (key === 'style') {
738740
return value.split(/;\s?/).reduce(function (styles, kvPair) {
@@ -750,7 +752,7 @@ function attributeValueToJSXPropValue(
750752
return styles
751753
}, {})
752754
} else if (key === 'href' || key === 'src') {
753-
return sanitizeUrl(value)
755+
return sanitizeUrlFn(value, tag, key)
754756
} else if (value.match(INTERPOLATION_R)) {
755757
// return as a string and let the consumer decide what to do with it
756758
value = value.slice(1, value.length - 1)
@@ -951,7 +953,7 @@ function matchParagraph(
951953
return [match, captured]
952954
}
953955

954-
function sanitizeUrl(url: string): string | undefined {
956+
export function sanitizer(url: string): string {
955957
try {
956958
const decoded = decodeURIComponent(url).replace(/[^A-Za-z0-9/:]/g, '')
957959

@@ -963,7 +965,7 @@ function sanitizeUrl(url: string): string | undefined {
963965
)
964966
}
965967

966-
return undefined
968+
return null
967969
}
968970
} catch (e) {
969971
if (process.env.NODE_ENV !== 'production') {
@@ -1138,12 +1140,13 @@ export function compiler(
11381140
options: MarkdownToJSX.Options = {}
11391141
) {
11401142
options.overrides = options.overrides || {}
1143+
options.sanitizer = options.sanitizer || sanitizer
11411144
options.slugify = options.slugify || slugify
11421145
options.namedCodesToUnicode = options.namedCodesToUnicode
11431146
? { ...namedCodesToUnicode, ...options.namedCodesToUnicode }
11441147
: namedCodesToUnicode
11451148

1146-
const createElementFn = options.createElement || React.createElement
1149+
options.createElement = options.createElement || React.createElement
11471150

11481151
// JSX custom pragma
11491152
// eslint-disable-next-line no-unused-vars
@@ -1158,7 +1161,7 @@ export function compiler(
11581161
) {
11591162
const overrideProps = get(options.overrides, `${tag}.props`, {})
11601163

1161-
return createElementFn(
1164+
return options.createElement(
11621165
getTag(tag, options.overrides),
11631166
{
11641167
...props,
@@ -1228,7 +1231,10 @@ export function compiler(
12281231
return React.createElement(wrapper, { key: 'outer' }, jsx)
12291232
}
12301233

1231-
function attrStringToMap(str: string): JSX.IntrinsicAttributes {
1234+
function attrStringToMap(
1235+
tag: MarkdownToJSX.HTMLTags,
1236+
str: string
1237+
): JSX.IntrinsicAttributes {
12321238
const attributes = str.match(ATTR_EXTRACTOR_R)
12331239
if (!attributes) {
12341240
return null
@@ -1243,8 +1249,10 @@ export function compiler(
12431249

12441250
const mappedKey = ATTRIBUTE_TO_JSX_PROP_MAP[key] || key
12451251
const normalizedValue = (map[mappedKey] = attributeValueToJSXPropValue(
1252+
tag,
12461253
key,
1247-
value
1254+
value,
1255+
options.sanitizer
12481256
))
12491257

12501258
if (
@@ -1366,7 +1374,7 @@ export function compiler(
13661374
parse(capture /*, parse, state*/) {
13671375
return {
13681376
// if capture[3] it's additional metadata
1369-
attrs: attrStringToMap(capture[3] || ''),
1377+
attrs: attrStringToMap('code', capture[3] || ''),
13701378
lang: capture[2] || undefined,
13711379
text: capture[4],
13721380
type: RuleType.codeBlock,
@@ -1409,13 +1417,13 @@ export function compiler(
14091417
order: Priority.HIGH,
14101418
parse(capture /*, parse*/) {
14111419
return {
1412-
target: `#${options.slugify(capture[1])}`,
1420+
target: `#${options.slugify(capture[1], slugify)}`,
14131421
text: capture[1],
14141422
}
14151423
},
14161424
render(node, output, state) {
14171425
return (
1418-
<a key={state.key} href={sanitizeUrl(node.target)}>
1426+
<a key={state.key} href={options.sanitizer(node.target, 'a', 'href')}>
14191427
<sup key={state.key}>{node.text}</sup>
14201428
</a>
14211429
)
@@ -1450,7 +1458,7 @@ export function compiler(
14501458
parse(capture, parse, state) {
14511459
return {
14521460
children: parseInline(parse, capture[2], state),
1453-
id: options.slugify(capture[2]),
1461+
id: options.slugify(capture[2], slugify),
14541462
level: capture[1].length as MarkdownToJSX.HeadingNode['level'],
14551463
}
14561464
},
@@ -1495,10 +1503,14 @@ export function compiler(
14951503
const noInnerParse =
14961504
DO_NOT_PROCESS_HTML_ELEMENTS.indexOf(tagName) !== -1
14971505

1506+
const tag = (
1507+
noInnerParse ? tagName : capture[1]
1508+
).trim() as MarkdownToJSX.HTMLTags
1509+
14981510
const ast = {
1499-
attrs: attrStringToMap(capture[2]),
1511+
attrs: attrStringToMap(tag, capture[2]),
15001512
noInnerParse: noInnerParse,
1501-
tag: (noInnerParse ? tagName : capture[1]).trim(),
1513+
tag,
15021514
} as {
15031515
attrs: ReturnType<typeof attrStringToMap>
15041516
children?: ReturnType<MarkdownToJSX.NestedParser> | undefined
@@ -1539,9 +1551,11 @@ export function compiler(
15391551
match: anyScopeRegex(HTML_SELF_CLOSING_ELEMENT_R),
15401552
order: Priority.HIGH,
15411553
parse(capture /*, parse, state*/) {
1554+
const tag = capture[1].trim() as MarkdownToJSX.HTMLTags
1555+
15421556
return {
1543-
attrs: attrStringToMap(capture[2] || ''),
1544-
tag: capture[1].trim(),
1557+
attrs: attrStringToMap(tag, capture[2] || ''),
1558+
tag,
15451559
}
15461560
},
15471561
render(node, output, state) {
@@ -1574,7 +1588,7 @@ export function compiler(
15741588
key={state.key}
15751589
alt={node.alt || undefined}
15761590
title={node.title || undefined}
1577-
src={sanitizeUrl(node.target)}
1591+
src={options.sanitizer(node.target, 'img', 'src')}
15781592
/>
15791593
)
15801594
},
@@ -1596,7 +1610,11 @@ export function compiler(
15961610
},
15971611
render(node, output, state) {
15981612
return (
1599-
<a key={state.key} href={sanitizeUrl(node.target)} title={node.title}>
1613+
<a
1614+
key={state.key}
1615+
href={options.sanitizer(node.target, 'a', 'href')}
1616+
title={node.title}
1617+
>
16001618
{output(node.children, state)}
16011619
</a>
16021620
)
@@ -1725,7 +1743,7 @@ export function compiler(
17251743
<img
17261744
key={state.key}
17271745
alt={node.alt}
1728-
src={sanitizeUrl(refs[node.ref].target)}
1746+
src={options.sanitizer(refs[node.ref].target, 'img', 'src')}
17291747
title={refs[node.ref].title}
17301748
/>
17311749
) : null
@@ -1749,7 +1767,7 @@ export function compiler(
17491767
return refs[node.ref] ? (
17501768
<a
17511769
key={state.key}
1752-
href={sanitizeUrl(refs[node.ref].target)}
1770+
href={options.sanitizer(refs[node.ref].target, 'a', 'href')}
17531771
title={refs[node.ref].title}
17541772
>
17551773
{output(node.children, state)}
@@ -1934,7 +1952,10 @@ export function compiler(
19341952
<footer key="footer">
19351953
{footnotes.map(function createFootnote(def) {
19361954
return (
1937-
<div id={options.slugify(def.identifier)} key={def.identifier}>
1955+
<div
1956+
id={options.slugify(def.identifier, slugify)}
1957+
key={def.identifier}
1958+
>
19381959
{def.identifier}
19391960
{emitter(parser(def.footnote, { inline: true }))}
19401961
</div>
@@ -2375,11 +2396,20 @@ export namespace MarkdownToJSX {
23752396
state: State
23762397
) => React.ReactChild
23772398

2399+
/**
2400+
* Override the built-in sanitizer function for URLs, etc if desired. The built-in version is available as a library export called `sanitizer`.
2401+
*/
2402+
sanitizer: (
2403+
value: string,
2404+
tag: HTMLTags,
2405+
attribute: string
2406+
) => string | null
2407+
23782408
/**
23792409
* Override normalization of non-URI-safe characters for use in generating
23802410
* HTML IDs for anchor linking purposes.
23812411
*/
2382-
slugify: (source: string) => string
2412+
slugify: (input: string, defaultFn: (input: string) => string) => string
23832413

23842414
/**
23852415
* Declare the type of the wrapper to be used when there are multiple

package.json

+2-2
Original file line numberDiff line numberDiff line change
@@ -99,11 +99,11 @@
9999
"size-limit": [
100100
{
101101
"path": "./dist/index.module.js",
102-
"limit": "6.1 kB"
102+
"limit": "6.2 kB"
103103
},
104104
{
105105
"path": "./dist/index.modern.js",
106-
"limit": "6.1 kB"
106+
"limit": "6.2 kB"
107107
}
108108
],
109109
"jest": {

0 commit comments

Comments
 (0)