CLI implementation of httpreserve that can test links and retrieve Internet Archive replacements. The tool can output the result of individual links, or take a CSV list to output collected information in JSON, BoltDB, or CSV format.
Usage: linkstat [Optional -link] [Optional -label]
[Optional -list] [Optional -json]
[Optional -bolt]
[Optional -csv]
[Optional -version -v]
Output: [Json]
Output: [CSV]
Output: [BoltDB]
Output: [Version] 'exponentialDK-httpreserve/0.0.9 ...'
Usage of ./linkstat:
Output to static BoltDB.
Output to CSV.
Output to JSON.
-label string
Annotate single URL check response with label.
-link string
Seek the status of a single URL: JSON
-list string
Provide a list of URLs to test against in CSV format.
-v Return httpreserve version.
Return httpreserve version.
Example combining tikalinkextract
Inspired by Harvard Innovation Labs to test the ability of httpreserve-workbench at the time. This CLI version is a simplification of that work but should still produce decent results. HTTPreserve Million Dollar Webpage Project
An input CSV example.csv
might look as follows:
"BBC News", ""
"BBC Home", ""
"BBC Radio", ""
"Google", ""
"", ""
"Internet Archive", ""
"", ""
"", ""
"The Million Dollar Homepage", ""
To output a CSV collecting all of the linkstat results, you can run a command as follows:
$ ./linkstat -csv --list example.csv > output.csv
And the output looks as follows:
"id","filename","link","response code","response text","title","content-type","archived","internet archive response code","internet archive response text","wayback earliest date","internet archive earliest","wayback latest date","internet archive latest","internet archive save link","protocol error","protocol error","analysis version number","analysis version text","stats creation time"
"1651a00b16a12ba06fc6c6b049c7cf7c","BBC News","","200","OK","home - bbc news","text/html;charset=utf-8","true","302","Found","09 October 1997","","19 March 2019","","","","","0.0.9","exponentialDK-httpreserve/0.0.9","1.574649021s"
"57ab6349a47b53b982a939fb1da54fef","BBC Radio","","200","OK","bbc sounds - music. radio. podcasts","text/html; charset=utf-8","true","302","Found","19 March 2008","","18 March 2019","","","","","0.0.9","exponentialDK-httpreserve/0.0.9","1.660729358s"
"c85da5e372ffe2200e46527b74537ba3","BBC Home","","200","OK","bbc - home","text/html; charset=utf-8","true","302","Found","21 December 1996","","19 March 2019","","","","","0.0.9","exponentialDK-httpreserve/0.0.9","1.95442772s"
"b3bd672c1014e07e87ef4a357a161528","","","206","Partial Content","ross spencer, digital preservation, archives, python developer, golang developer, uk, nz","text/html","true","302","Found","17 September 2008","","13 November 2018","","","","","0.0.9","exponentialDK-httpreserve/0.0.9","425.368183ms"
The command: ./linkstat -link -label "GitHub"
"FileName": "GitHub",
"AnalysisVersionNumber": "0.0.15",
"AnalysisVersionText": "exponentialDK-httpreserve/0.0.15",
"SimpleRequestVersion": "httpreserve-simplerequest/0.0.4",
"Link": "",
"Title": "github: let’s build from here · github",
"ContentType": "text/html; charset=utf-8",
"ResponseCode": 200,
"ResponseText": "OK",
"SourceURL": "",
"ScreenShot": "snapshots are not currently enabled",
"InternetArchiveLinkEarliest": "",
"InternetArchiveEarliestDate": "2008-05-14 21:01:48 +0000 UTC",
"InternetArchiveLinkLatest": "",
"InternetArchiveLatestDate": "2023-08-29 06:28:55 +0000 UTC",
"InternetArchiveSaveLink": "",
"InternetArchiveResponseCode": 302,
"InternetArchiveResponseText": "Found",
"RobustLinkEarliest": "<a href='' data-originalurl='' data-versiondate='2008-05-14'>HTTPreserve Robust Link - simply replace this text!!</a>",
"RobustLinkLatest": "<a href='' data-originalurl='' data-versiondate='2023-08-29'>HTTPreserve Robust Link - simply replace this text!!</a>",
"PWID": "",
"Archived": true,
"Error": false,
"ErrorMessage": "",
"StatsCreationTime": "7.070152149s"
- Find and Connect Project: Nicola Laurent on the impact of broken links.
- Binary Trees? Automatically Identifying the links between born digital records: I write about hyperlinks as a public record in own right when submitted as part of a documentary heritage.
- HiberActive Pilot A scholarly publishing tool that extracts URLs, returns both the original URL and a perma-link.
- IIPC Awesome List A list of web-archiving links that invites contributions from the community to keep it up-to-date.
GNU General Public License Version 3. Full Text