Skip to content

feat: add fastCRW document loader node#6511

Open
us wants to merge 2 commits into
FlowiseAI:mainfrom
us:feat/add-fastcrw
Open

feat: add fastCRW document loader node#6511
us wants to merge 2 commits into
FlowiseAI:mainfrom
us:feat/add-fastcrw

Conversation

@us

@us us commented Jun 13, 2026

Copy link
Copy Markdown

What

Adds a fastCRW document loader node + credential, mirroring the existing FireCrawl node.

Why

fastCRW is a Firecrawl-API-compatible web engine in a single ~8MB binary — self-host free or managed cloud. Flat pricing (1 credit = 1 page; no 4x stealth surcharge, no billed-on-failure), free anti-bot stealth — a drop-in alternative to the FireCrawl loader.

Changes (additive only)

  • packages/components/nodes/documentloaders/Crw/Crw.ts (+ icon) mirroring FireCrawl/FireCrawl.ts (scrape/crawl modes, same inputs).
  • packages/components/credentials/CrwApi.credential.ts mirroring the FireCrawl credential.
  • Crw.test.ts.

CRW_API_KEY from https://fastcrw.com/dashboard (free tier). Happy to adjust — I maintain it and can provide free credits.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the fastCRW document loader, adding support for crawling, scraping, extracting, and searching web content using a Firecrawl-compatible API. It includes the credential configuration, the core loader class, the Flowise node integration, and comprehensive unit tests. The code review identified three high-severity issues: a potential runtime TypeError when modifying this.params without initialization, and premature job failures in both the crawl and extract status polling loops due to unhandled intermediate statuses like 'pending', 'active', and 'cancelled'.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

if (!this.url) {
throw new Error('fastCRW: URL is required for extract mode')
}
this.params!.urls = [this.url]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If this.params is undefined, attempting to set this.params!.urls will throw a TypeError at runtime. Since params is optional in the CrwLoader constructor, we should safely initialize or construct the parameters object.

            this.params = {
                ...this.params,
                urls: [this.url]
            }

Comment on lines +490 to +496
case 'scraping':
case 'failed':
if (statusData.status === 'failed') {
throw new Error('Crawl job failed')
}
await new Promise((resolve) => setTimeout(resolve, Math.max(checkInterval, 2) * 1000))
break

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The crawl job can have intermediate statuses like 'pending' or 'active' before transitioning to 'scraping' or 'completed'. Currently, any status other than 'completed', 'scraping', or 'failed' will hit the default block and throw an error, causing the crawl to fail prematurely. We should include 'pending' and 'active' in the polling loop.

                    case 'pending':
                    case 'active':
                    case 'scraping':
                    case 'failed':
                        if (statusData.status === 'failed') {
                            throw new Error('Crawl job failed')
                        }
                        await new Promise((resolve) => setTimeout(resolve, Math.max(checkInterval, 2) * 1000))
                        break

Comment on lines +517 to +523
case 'processing':
case 'failed':
if (statusData.status === 'failed') {
throw new Error('Extract job failed')
}
await new Promise((resolve) => setTimeout(resolve, Math.max(checkInterval, 2) * 1000))
break

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The ExtractStatusResponse interface explicitly defines 'pending' and 'cancelled' as valid statuses. However, they are not handled in the switch-case, meaning a 'pending' status will hit the default block and throw an error immediately. We should handle 'pending' by waiting/polling, and 'cancelled' by throwing a specific error.

                    case 'pending':
                    case 'processing':
                    case 'failed':
                    case 'cancelled':
                        if (statusData.status === 'failed') {
                            throw new Error('Extract job failed')
                        }
                        if (statusData.status === 'cancelled') {
                            throw new Error('Extract job was cancelled')
                        }
                        await new Promise((resolve) => setTimeout(resolve, Math.max(checkInterval, 2) * 1000))
                        break

- guard this.params init in extract mode to avoid TypeError when params is undefined
- treat non-terminal crawl statuses (scraping/pending/active) as keep-polling; only fail on failed/error/cancelled
- handle pending/processing extract statuses as keep-polling; fail explicitly on failed/cancelled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant