Commit Graph

  • db689ad1e7 Add duckdb_warc (#178) main Ed Summers 2026-04-27 12:36:28 -04:00
  • d9ca358fb5 Add subsections for training material (#177) Natanael Arndt 2026-04-22 10:38:30 +00:00
  • 303c558027 more whirlwinds 🌪️ (#176) Greg Lindahl 2026-04-21 08:24:20 -07:00
  • 58be7236f6 fix syntax of stable and in development annotations (#173) Natanael Arndt 2026-03-18 17:31:19 +00:00
  • a19a9466ee Add bagnabit2warc utility (#172) Alex Osborne 2026-03-17 23:02:34 +09:00
  • 6a9d393783 Update Chrome Web Store URLs to new format (#169) Michael Lip 2026-03-04 18:03:40 +07:00
  • b8bf24e051 doc: add public data (#168) Greg Lindahl 2026-01-19 11:28:05 -08:00
  • 2b3e2e24ac Adding two tools (#166) Ed Summers 2026-01-08 12:05:26 -05:00
  • 7d65f20ae0 feat: common crawl discord and twitter (#164) Greg Lindahl 2026-01-07 11:51:49 -08:00
  • 4b42fd3de4 Add warcbench tool to README links (#163) Ed Summers 2025-10-31 14:52:19 -04:00
  • 304a530b78 Update some links for moved repos (#162) Natanael Arndt 2025-09-23 19:38:36 +00:00
  • 6915bc5487 Remove indention (#161) Natanael Arndt 2025-04-09 21:49:05 +00:00
  • af8c5bbc19 Add Community Archive (Twitter Archive and API) (#160) Ross Spencer 2025-02-11 19:43:13 +01:00
  • cf4504832d Update README.md (add hyphe tool from médialab) (#159) Benjamin Ooghe-Tabanou 2025-01-29 13:34:53 +01:00
  • a48cb0da7a Fix b193d5411a nruest 2025-01-28 13:14:13 -05:00
  • b193d5411a Update README.md (#158) Guillaume Levrier 2025-01-28 16:20:32 +01:00
  • 1aad9d46c9 Added warcat-rs (#157) Ed Summers 2025-01-03 19:37:35 -05:00
  • 670bcba445 Update link to Stanford Libraries' Archivability pages, closes #154 (#156) Martin Hoppenheit 2024-12-27 16:58:50 +01:00
  • 129d1fa2ff Update link for The Unarchiver (#155) Martin Hoppenheit 2024-12-27 16:16:53 +01:00
  • 49282b06dd Updates Webrecorder's website links (#153) Henry Wilkinson 2024-11-05 21:20:58 -05:00
  • 5bff7a0d46 Add a few new Common Crawl resources (#152) Greg Lindahl 2024-11-05 05:42:39 -08:00
  • 952e4d34dd Update URI for SiteStory (#151) Mat Kelly 2024-10-17 15:21:36 -04:00
  • 1953151aae Update README.md IIPC 2024-09-10 10:26:04 -04:00
  • 168526a62c Fix the jwat link(s) according to answers in the #os-sos@iipc.slack.com channel (#149) Natanael Arndt 2024-05-08 14:44:42 +02:00
  • 99241ae461 Added warc-safe to list (#148) lasztoth 2024-05-06 14:26:07 +02:00
  • 8e713a4388 Update list with current Webrecorder related URLs (#147) Henry Wilkinson 2024-04-25 10:43:26 +02:00
  • 101ee998d9 Adding a Web Archive Services section to list hosted and self-hostable web archiving options. (#144) Andy Jackson 2024-01-18 15:57:01 +00:00
  • 86c769597d Add IA Library to Utilities (#143) kokomo123 2023-12-20 08:53:11 -05:00
  • f0b7cdbae0 Added warcdb (#142) Ed Summers 2023-10-16 12:21:54 -04:00
  • 4b12cc7b32 Update the details around HTTPreserve.info (#141) Ross Spencer 2023-08-30 07:14:42 -04:00
  • 034582f3aa Adjusted jwarc description (#140) Ed Summers 2023-08-01 11:55:09 -04:00
  • d6ca8af2c0 Update README.md (#139) IIPC 2023-07-14 07:31:45 -04:00
  • 5d41023b2b add cc analysis (#138) Greg Lindahl 2023-07-04 09:54:21 -07:00
  • d4673d008e add cdx-toolkit (#135) Greg Lindahl 2023-07-04 01:37:05 -07:00
  • d395bb1b44 add common crawl mailing list (#136) Greg Lindahl 2023-07-04 01:36:05 -07:00
  • bf9664ff45 add web data commons (#137) Greg Lindahl 2023-07-04 01:34:33 -07:00
  • 54110410bf warcio was stable a long time ago (#134) Greg Lindahl 2023-07-04 01:33:09 -07:00
  • 4c04474998 this link works (#131) Greg Lindahl 2023-06-28 03:38:31 -07:00
  • 11fee57dcb Fix linter error: Ignore double IA Wayback link. (#129) Nick Ruest 2023-06-01 15:58:29 -04:00
  • 232966c4cb Add gogetcrawl (#128) Rustem Kamalov 2023-06-01 22:33:15 +03:00
  • d8631ddf05 Add crau. (#127) Nick Ruest 2023-04-30 20:05:45 -04:00
  • 4ecc363191 Adding @harvard-lil/scoop (#126) Matteo Cargnelutti 2023-04-26 16:56:25 -04:00
  • 46dc9518e4 added warcdedupe (#125) Ed Summers 2023-04-18 20:28:39 -04:00
  • b309687f88 Update runs-on Andy Jackson 2023-04-13 14:46:14 +01:00
  • 6bdb3373cb Add two tools that can do WARC deduplication (#124) Andy Jackson 2023-04-12 16:00:52 +01:00
  • fc1a73d22d Rename 22120 to DiskerNet (#123) Hendursaga 2023-01-20 07:59:41 -05:00
  • 248f9dc42e Update README.md (#122) Andy Jackson 2022-10-18 00:47:33 +01:00
  • 0104c202c8 Fix typo (#121) Mat Kelly 2022-09-27 10:46:37 -04:00
  • 6b7a3372d4 Add the Bellingcat Auto Archiver (#120) Andy Jackson 2022-09-24 03:38:51 +01:00
  • f1a10b71b1 Update README.md IIPC 2022-08-23 12:10:15 -04:00
  • 62515809d6 Add ARCH and Sparkling (#119) Nick Ruest 2022-05-25 17:30:37 -04:00
  • 0391cce057 Update README.md IIPC 2022-05-12 00:04:39 -04:00
  • 36dadbf3c4 Update README.md IIPC 2022-05-11 23:41:43 -04:00
  • 82e512bde2 Correct the link for "Web as History" (#118) Ross Spencer 2022-03-03 18:19:52 +01:00
  • 232ef44fd2 + waybackpy (https://github.com/akamhy/waybackpy) (#117) Akash Mahanty 2022-01-23 10:21:23 +05:30
  • d3cbc44fbd Add Unwarcit (#115) Mat Kelly 2022-01-05 10:32:05 -05:00
  • 921cf36496 Add FastWARC (#114) Mat Kelly 2021-12-13 11:30:53 -05:00
  • 30661eacd0 Add warc2html to Replay section (#113) Alex Osborne 2021-11-08 14:53:24 +09:00
  • 9ff76782d1 Add Wayback to Acquisition (#112) Wayback Archiver 2021-10-07 20:45:48 +08:00
  • 393919d9ee Add gowarcserver by Norsk nettarkiv (#111) Andy Jackson 2021-07-20 14:15:59 +01:00
  • 7b5c80c44f Adding WCT and a separate curation section. (#110) Andy Jackson 2021-07-13 13:33:08 +01:00
  • a9daaebc34 Remove Archives Unleashed Cloud. (#109) Nick Ruest 2021-06-30 15:20:30 -04:00
  • cf1c8ff4f1 Add Warcprox (#108) Youssef Eldakar 2021-06-22 14:39:37 +02:00
  • 5e11c22564 Added Browsertrix and ArchiveWeb.page (#107) Ed Summers 2021-05-28 14:45:29 -04:00
  • f2ae23d5ae added @WebSciDL (#106) Michael L. Nelson 2021-04-27 14:46:18 -04:00
  • 9fe7d3558b Add playback (#105) WaybackBot 2021-04-25 01:36:54 +08:00
  • 9d2356b766 Add httrack2warc utility (#104) Alex Osborne 2021-04-16 22:08:07 +09:00
  • 821eaf9fbc Patch 1 (#103) Thomas Egense 2021-03-05 14:02:10 +01:00
  • 3de3d8c59b Add 22120 (#102) Cris Stringfellow 2020-11-09 23:54:58 +08:00
  • 19fc5214e1 Add Cairn and Obelisk to the list. (#100) WaybackBot 2020-11-07 02:41:21 +08:00
  • 98f6832c15 Sort Replay section alphabetically to align with other sections (#96) Mat Kelly 2020-09-17 21:01:26 -04:00
  • b3ef2514e0 Update README.md IIPC PCO 2020-09-16 21:42:41 +00:00
  • ac682223a6 Update README.md IIPC PCO 2020-09-16 21:37:20 +00:00
  • d2c8ff8ae2 Move Lentil to deprecated list. (#94) Nick Ruest 2020-06-22 20:29:46 -04:00
  • 36ac91b158 Update WebRecorder's replay tool to ReplayWeb.Page (#93) Alex Wendland 2020-06-22 05:30:25 -07:00
  • 8d6217af8d Update link to aut documentation, and remove Warcbase workshop. (#92) Nick Ruest 2020-06-09 22:40:50 -04:00
  • 078fc3adc1 Adding GLAM Workbench and Awesome Lists section (#91) Andy Jackson 2020-06-05 11:40:01 +01:00
  • 84d213689c Elaborate description and make tags italic (#90) Sawood Alam 2020-03-26 17:00:22 -04:00
  • cb72a26752 Fix a typo in #87 (#89) Sawood Alam 2020-03-10 12:33:17 -04:00
  • ea98c25983 Add DSHR Blog (#87) Sawood Alam 2020-03-07 05:37:24 -05:00
  • 3a96fb2d16 Add @ato's guidelines from Slack discussion. (#81) Andy Jackson 2020-03-05 15:16:20 +00:00
  • 3c0dc1e1b5 Update blog description to be more objective. (#85) Mat Kelly 2020-03-03 15:19:44 -05:00
  • 78bc949dab Big D, like the other instances, per CONTRIBUTING.md (#86) Mat Kelly 2020-03-03 14:40:31 -05:00
  • 3a95c19f10 Fix Twitter links (#83) Sawood Alam 2020-02-26 21:31:21 -05:00
  • 20f528c438 Add WS-DL blog (#84) Sawood Alam 2020-02-26 21:30:51 -05:00
  • d7f2f1ac8e Add Mink (#82) Mat Kelly 2020-02-26 16:15:00 -05:00
  • 117ce3f163 Add link for clarity (#80) Mat Kelly 2020-02-26 11:49:54 -05:00
  • b827e6aada Add Reconstructive (#79) Sawood Alam 2020-02-25 13:50:06 -05:00
  • e696a5e09b Updates for Archives Unleashed projects. (#78) Nick Ruest 2020-02-25 10:27:36 -05:00
  • 99ec216d7c Add jwarc Alex Osborne 2020-02-26 00:09:38 +09:00
  • 38f45540f3 Split WARC I/O libraries out from utilities (#77) Alex Osborne 2020-02-26 00:00:19 +09:00
  • 3c673568d9 Fix awesome lint issues (#75) Sawood Alam 2020-02-25 09:47:58 -05:00
  • b3a519f9de Add GH Workflow Action for automated linting (#76) Sawood Alam 2020-02-25 07:54:57 -05:00
  • be92a9373f Add MementoMap to the list of utilities (#74) Sawood Alam 2020-02-24 09:29:03 -05:00
  • 3e4eb21c67 Add MemGator to the list of utilities (#73) Sawood Alam 2020-02-21 19:21:57 -05:00
  • 096df3bd93 Remove duplicate entry (#72) Sawood Alam 2020-02-21 19:21:36 -05:00
  • 015e2c3ba9 Add monolith (#71) Jan Vlnas 2019-09-12 13:43:10 +02:00
  • 774beb2a44 Add SingleFile (#70) Jan Vlnas 2019-09-06 14:35:22 +02:00
  • f72c1ed8eb Add WebMemex (#69) Jan Vlnas 2019-08-16 12:48:09 +02:00
  • b6d49ec5bc Add freeze-dry (#68) Jan Vlnas 2019-08-16 12:38:47 +02:00