Skip to content

Releases: coder-hxl/x-crawl

v5.0.1

08 Apr 03:29
Compare
Choose a tag to compare

🚀 Features

  • New adjustments to the document.

🚀 特征

  • 文档新的调整。

v5.0.0

26 Apr 02:50
Compare
Choose a tag to compare

🚨 Breaking Changes

  • For configuration, major changes have been made to each crawling API configuration, and the same API supports more crawling configuration methods, each of which has its own significance.
  • For the result, the result of each request will be wrapped in an object, which provides information about the result of this request, such as: id, result, success, maximum retry, number of retries, collected error information, etc. . Automatically determine whether the return value is wrapped in an array according to the configuration method you choose, and the type is perfectly matched in TS.
  • For obtaining results through the callback function, the callback is no longer executed after a single request is completed like the v4.x version, but will be executed sequentially after the crawling is completed, which will not block subsequent crawling.

🚀 Features

  • Added a retry mechanism, which can be set for all crawling requests, for a single crawling request, and for a single request.
  • A new priority queue is added to use priority crawling according to the priority of a single request.
  • For more configurations that may be reused, you can set the baseConfig settings passed in when requesting configuration, API crawling configuration, and generating crawler instances, such as: timeout, proxy, intervalTime, etc., and the weight is: requestConfig > APIConfig > baseConfig.
  • For crawlFile API, file path, name, suffix and other information can be set individually for each file. Added the beforeSave life cycle function before saving the file. You can get the file data of the Buffer type, and you can perform operations such as compression on the data in the callback. The returned new Buffer data will replace the original data and write it into the file.
  • Update the output of crawling on the console, and collect the error information generated by crawling into an error queue. After the crawling is completed, you can get the error message queue through the return value.

🚨 重大改变

  • 对于配置,每个爬取 API 配置发生重大改变,同一个 API 支持更多爬取配置方式,每种方式都有其存在的意义。
  • 对于结果,每个请求的结果将统一使用对象包裹着,该对象提供了关于这次请求结果的信息,比如:id、结果、是否成功、最大重试、重试次数、收集到错误信息等。自动根据你选用的配置方式决定返回值是否包裹在一个数组中,在 TS 中类型完美适配。
  • 对于通过回调函数方式获取结果,该回调不再像 v4.x 版本在单个请求完成后执行,而是将会在爬取完成后按顺序执行,这将不会阻塞后续的爬取。

🚀 特征

  • 新增失败重试,可针对所有爬取的请求设置,针对单次爬取的请求设置,针对单个请求设置进行失败重试。
  • 新增优先队列,根据单个请求的优先级使用优先爬取。
  • 对更多可能复用的配置可以在请求配置、API 爬取配置、生成爬虫实例时传入的 baseConfig 设置,比如:timeout、proxy、intervalTime 等,权重为:requestConfig > APIConfig > baseConfig。
  • 对 crawlFile API 可单独为每个文件设置文件路径、名字、后缀等信息。新增 beforeSave 文件保存前生命周期函数,可拿到 Buffer 类型的文件数据,可在回调内对数据进行压缩等操作,返回新的 Buffer 数据会替代原先的数据写入文件中。
  • 对爬取在控制台的输出更新,对爬取产生的报错信息分别收集到一个的错误队列中,爬取完成后可通过返回值拿到该错误信息队列。

v4.0.1

30 Mar 00:52
Compare
Choose a tag to compare

🐞 Bug Fixes

  • The page is not closed when there is an error in the crawlPage API

v4.0.0

27 Mar 03:36
Compare
Choose a tag to compare

🚨 Breaking Changes

  • The crawlPage API can add batch requests.
  • The crawlPage API remove JSDOM.

🚀 Features

  • Document updates.

v3.3.0

24 Mar 10:23
Compare
Choose a tag to compare

🚀 Features

  • The crawlerPage API crawling page allows to carry Cookies (for login and other operations).

v3.2.12

23 Mar 11:13
Compare
Choose a tag to compare

🐞 Bug Fixes

  • Document jump fix

v3.2.11

22 Mar 01:45
Compare
Choose a tag to compare

🚀 Features

  • Test updates, unit test all APIs

🐞 Bug Fixes

  • Fix crawlPage API internal error

v3.2.10

21 Mar 01:54
Compare
Choose a tag to compare

🚀 Features

  • Update documents

v3.2.9

20 Mar 07:53
Compare
Choose a tag to compare

🚀 Features

  • Update dependency

v3.2.8

19 Mar 07:31
Compare
Choose a tag to compare

🐞 Bug Fixes

  • Internal type adjustment.
  • Catch crawlPage API errors.