Open
Description
Describe the Bug
Page title returned from /crawl
, /scrape
is set incorrectly in some cases.
To Reproduce
Steps to reproduce the issue:
- Do a
/scrape
on https://voluitsuite.nl/post/voluit-suite-updates-5-juni-2024 - See JSON
title
of the scraped page is empty - Open HTML of the source page and see
title
tag is not empty
Expected Behavior
I would expect both /scrape
and /crawl
return a title
of the page when title
tag is present
Environment (please complete the following information):
- OS: Linux
- Firecrawl Version: I'm using https://www.firecrawl.dev/app/playground so that it's easy to reproduce for you and me
- Node.js Version: NA
Additional Context
I think the problem here is that the source HTML page has both title
and meta name="title"
tags:
<title>voluit suite updates 5 juni 2024</title>
<meta name="title" content="voluit suite updates 5 juni 2024">
As you can see, none of them are empty so I'm not 100% sure why the problem is happening.
The potential solutions are:
- do not override title with empty meta.title value when merging dicts https://github.com/mendableai/firecrawl/blob/main/apps/api/src/scraper/scrapeURL/lib/extractMetadata.ts#L191 (easy fix);
- figure out why meta.title wasn't propagated correctly? does Firecrawl not wait long enough for the value to be set?