Commit Graph

6 Commits

Author SHA1 Message Date
Alan Orth
abad6116cb DS-4587: Update the spider user agent file
This updates the included spider user agent file to the latest from
the COUNTER-Robots project. DSpace's own copy is over five years old
and is missing a bunch of new patterns, which greatly decreases the
accuracy of the Solr usage statistics.

Port of #3333 to main for DSpace 7.x.

See: https://jira.lyrasis.org/browse/DS-4587
See: https://github.com/atmire/COUNTER-Robots/releases/tag/2021-07-05
2021-09-04 21:24:16 +03:00
Mark H. Wood
522c6fb696 [DS-2463] Escape pattern characters that are significant in regular expressions. 2015-10-18 11:01:21 -04:00
Mark H. Wood
014128bf2e [DS-2463] Remove stale spider file; don't poll to update it; extract interesting UA patterns. 2015-10-18 08:43:37 -04:00
Bram Luyten
14a4850b0b DS-2531 New entries for the robots hostname list 2015-04-01 16:10:29 +02:00
Hardy Pottinger
4f5846f2b8 DS-1841: adding example files for agent and domain-based spider filtering, borrowed from OSU Libraries, with much thanks 2013-12-12 23:17:52 +00:00
Mark Diggory
a5beae59c2 [DS-440] Adjust SpiderDownloader to download multiple files in a "config/spiders" directory relative ${dspace.dir}
git-svn-id: http://scm.dspace.org/svn/repo/dspace/trunk@4744 9c30dcfa-912a-0410-8fc2-9e0234be79fd
2010-02-07 16:42:56 +00:00