[Feat] Option to include header/footer once when using onlyMainContent

**Problem Description**
Marketing sites often include information like phone numbers and addresses in the footer. So if you need that information in your dataset, you can't use `onlyMainContent`, meaning you have to have `n * numPages` copies of that header and footer.

**Proposed Feature**

A flag to use with or instead of `onlyMainContent` that puts header/footer/etc data as a separate 'page' or onlyMainContent for all but the first scraped page.

**Alternatives Considered**
* Deduplication post-scrape - doable, but a bit messy and sometimes unreliable
* Accept the repeated data - often gets in the way of llm context windows

**Implementation Suggestions**
Whatever mechanism is used to exclude the non-main content could be used in reverse to grab it exclusively. 

**Use Case**
It would allow for a happy medium option of how much non-main content to include. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] Option to include header/footer once when using onlyMainContent #1518

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feat] Option to include header/footer once when using onlyMainContent #1518

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions