You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue: Support excludeTags with onlyMainContent=true
Current Behavior
When onlyMainContent is set to true, the excludeTags parameter appears to be ignored. The content extraction works, but I cannot apply additional tag exclusions.
Expected Behavior
excludeTags should work in conjunction with onlyMainContent=true, allowing for:
Base content filtering through onlyMainContent
Additional fine-tuned control through custom excludeTags
Problem Details
Currently:
With onlyMainContent=true: excludeTags are ignored
This suggests that onlyMainContent uses a more aggressive content filtering algorithm than just tag exclusion.
Proposed Solution
Add support for excludeTags to work even when onlyMainContent=true. This would provide more flexible content filtering options by combining both features.
Use Case
This would be useful when users want to:
Get the main content of a page (onlyMainContent=true)
Additionally exclude specific tags that might still be present in the main content
Would it be possible to implement this feature enhancement?
The text was updated successfully, but these errors were encountered:
@nickscamara, I checked it once again and I believe it still doesn't work. In the example below, I still get a div with id=comments. Could someone please also confirm it?
Hello,
Issue: Support excludeTags with onlyMainContent=true
Current Behavior
When
onlyMainContent
is set totrue
, theexcludeTags
parameter appears to be ignored. The content extraction works, but I cannot apply additional tag exclusions.Expected Behavior
excludeTags
should work in conjunction withonlyMainContent=true
, allowing for:onlyMainContent
excludeTags
Problem Details
Currently:
onlyMainContent=true
: excludeTags are ignoredonlyMainContent=false
: using excludeTags from the default list (https://github.com/mendableai/firecrawl/blob/79e65f31ef1d7a4172870471d81501ee2e8aef22/apps/api/src/scraper/WebScraper/utils/excludeTags.ts) plus custom tags results in longer outputThis suggests that
onlyMainContent
uses a more aggressive content filtering algorithm than just tag exclusion.Proposed Solution
Add support for
excludeTags
to work even whenonlyMainContent=true
. This would provide more flexible content filtering options by combining both features.Use Case
This would be useful when users want to:
onlyMainContent=true
)Would it be possible to implement this feature enhancement?
The text was updated successfully, but these errors were encountered: