mirror of
https://github.com/DSpace/DSpace.git
synced 2025-10-10 11:33:11 +00:00

This updates the included spider user agent file to the latest from the COUNTER-Robots project. DSpace's own copy is over five years old and is missing a bunch of new patterns, which greatly decreases the accuracy of the Solr usage statistics. Port of #3333 to main for DSpace 7.x. See: https://jira.lyrasis.org/browse/DS-4587 See: https://github.com/atmire/COUNTER-Robots/releases/tag/2021-07-05
308 lines
3.5 KiB
Plaintext
308 lines
3.5 KiB
Plaintext
bot
|
|
^Buck\/[0-9]
|
|
spider
|
|
crawl
|
|
^.?$
|
|
[^a]fish
|
|
^IDA$
|
|
^ruby$
|
|
^@ozilla\/\d
|
|
^脝脝陆芒潞贸碌脛$
|
|
^破解后的$
|
|
AddThis
|
|
A6-Indexer
|
|
ADmantX
|
|
alexa
|
|
Alexandria(\s|\+)prototype(\s|\+)project
|
|
AllenTrack
|
|
almaden
|
|
appie
|
|
API[\+\s]scraper
|
|
Arachni
|
|
Arachmo
|
|
architext
|
|
ArchiveTeam
|
|
aria2\/\d
|
|
arks
|
|
^Array$
|
|
asterias
|
|
atomz
|
|
BDFetch
|
|
Betsie
|
|
baidu
|
|
biglotron
|
|
BingPreview
|
|
binlar
|
|
bjaaland
|
|
Blackboard[\+\s]Safeassign
|
|
blaiz-bee
|
|
bloglines
|
|
blogpulse
|
|
boitho\.com-dc
|
|
bookmark-manager
|
|
Brutus\/AET
|
|
BUbiNG
|
|
bwh3_user_agent
|
|
CakePHP
|
|
celestial
|
|
cfnetwork
|
|
checklink
|
|
checkprivacy
|
|
China\sLocal\sBrowse\s2\.6
|
|
Citoid
|
|
cloakDetect
|
|
coccoc\/1\.0
|
|
Code\sSample\sWeb\sClient
|
|
ColdFusion
|
|
collection@infegy.com
|
|
com\.plumanalytics
|
|
combine
|
|
contentmatch
|
|
ContentSmartz
|
|
convera
|
|
core
|
|
Cortana
|
|
CoverScout
|
|
crusty\/\d
|
|
curl\/
|
|
cursor
|
|
custo
|
|
DataCha0s\/2\.0
|
|
daum(oa)?
|
|
^\%?default\%?$
|
|
DeuSu\/
|
|
Dispatch\/\d
|
|
Docoloc
|
|
docomo
|
|
Download\+Master
|
|
Drupal
|
|
DSurf
|
|
DTS Agent
|
|
EasyBib[\+\s]AutoCite[\+\s]
|
|
easydl
|
|
EBSCO\sEJS\sContent\sServer
|
|
EcoSearch
|
|
ELinks\/
|
|
EmailSiphon
|
|
EmailWolf
|
|
Embedly
|
|
EThOS\+\(British\+Library\)
|
|
facebookexternalhit\/
|
|
favorg
|
|
FDM(\s|\+)\d
|
|
Feedbin
|
|
feedburner
|
|
FeedFetcher
|
|
feedreader
|
|
ferret
|
|
Fetch(\s|\+)API(\s|\+)Request
|
|
findlinks
|
|
findthatfile
|
|
^FileDown$
|
|
^Filter$
|
|
^firefox$
|
|
^FOCA
|
|
Fulltext
|
|
Funnelback
|
|
Genieo
|
|
GetRight
|
|
geturl
|
|
GigablastOpenSource
|
|
G-i-g-a-b-o-t
|
|
GLMSLinkAnalysis
|
|
Goldfire(\s|\+)Server
|
|
google
|
|
Grammarly
|
|
grub
|
|
gulliver
|
|
gvfs\/
|
|
harvest
|
|
heritrix
|
|
holmes
|
|
htdig
|
|
htmlparser
|
|
HttpComponents\/1.1
|
|
HTTPFetcher
|
|
http.?client
|
|
httpget
|
|
httrack
|
|
ia_archiver
|
|
ichiro
|
|
iktomi
|
|
ilse
|
|
Indy Library
|
|
^integrity\/\d
|
|
internetseer
|
|
intute
|
|
iSiloX
|
|
iskanie
|
|
^java\/\d{1,2}.\d
|
|
jeeves
|
|
Jersey\/\d
|
|
jobo
|
|
kyluka
|
|
larbin
|
|
libcurl
|
|
libhttp
|
|
libwww
|
|
lilina
|
|
^LinkAnalyser
|
|
link.?check
|
|
LinkLint-checkonly
|
|
^LinkParser\/
|
|
^LinkSaver\/
|
|
linkscan
|
|
LinkTiger
|
|
linkwalker
|
|
lipperhey
|
|
livejournal\.com
|
|
LOCKSS
|
|
LongURL.API
|
|
ltx71
|
|
lwp
|
|
lycos[_+]
|
|
mail\.ru
|
|
MarcEdit
|
|
mediapartners-google
|
|
megite
|
|
MetaURI[\+\s]API\/\d\.\d
|
|
Microsoft(\s|\+)URL(\s|\+)Control
|
|
Microsoft Office Existence Discovery
|
|
Microsoft Office Protocol Discovery
|
|
Microsoft-WebDAV-MiniRedir
|
|
mimas
|
|
mnogosearch
|
|
moget
|
|
motor
|
|
^Mozilla$
|
|
^Mozilla.4\.0$
|
|
^Mozilla\/4\.0\+\(compatible;\)$
|
|
^Mozilla\/4\.0\+\(compatible;\+ICS\)$
|
|
^Mozilla\/4\.5\+\[en]\+\(Win98;\+I\)$
|
|
^Mozilla.5\.0$
|
|
^Mozilla\/5.0\+\(compatible;\+MSIE\+6\.0;\+Windows\+NT\+5\.0\)$
|
|
^Mozilla\/5\.0\+like\+Gecko$
|
|
^Mozilla\/5.0(\s|\+)Gecko\/20100115(\s|\+)Firefox\/3.6$
|
|
^MSIE
|
|
MuscatFerre
|
|
myweb
|
|
nagios
|
|
^NetAnts\/\d
|
|
netcraft
|
|
netluchs
|
|
newspaper\/\d
|
|
ng\/2\.
|
|
^Ning\/\d
|
|
no_user_agent
|
|
nomad
|
|
nutch
|
|
^oaDOI$
|
|
ocelli
|
|
Offline(\s|\+)Navigator
|
|
OgScrper
|
|
okhttp
|
|
onetszukaj
|
|
^Opera\/4$
|
|
OurBrowser
|
|
panscient
|
|
parsijoo
|
|
^Pattern\/\d
|
|
Pcore-HTTP
|
|
pear\.php\.net
|
|
perman
|
|
PHP\/
|
|
pidcheck
|
|
pioneer
|
|
playmusic\.com
|
|
playstarmusic\.com
|
|
^Postgenomic(\s|\+)v2
|
|
powermarks
|
|
proximic
|
|
PycURL
|
|
python
|
|
Qwantify
|
|
rambler
|
|
ReactorNetty\/\d
|
|
Readpaper
|
|
redalert
|
|
Riddler
|
|
robozilla
|
|
rss
|
|
scan4mail
|
|
scientificcommons
|
|
scirus
|
|
scooter
|
|
Scrapy\/\d
|
|
ScoutJet
|
|
^scrutiny\/\d
|
|
SearchBloxIntra
|
|
shoutcast
|
|
Site24x7
|
|
SkypeUriPreview
|
|
slurp
|
|
sogou
|
|
speedy
|
|
sqlmap
|
|
SrceDAMP
|
|
Strider
|
|
summify
|
|
sunrise
|
|
Sysomos
|
|
T\-H\-U\-N\-D\-E\-R\-S\-T\-O\-N\-E
|
|
tailrank
|
|
Teleport(\s|\+)Pro
|
|
Teoma
|
|
The\+Knowledge\+AI
|
|
titan
|
|
^Traackr\.com$
|
|
Trello
|
|
Trove
|
|
Turnitin
|
|
twiceler
|
|
Typhoeus
|
|
ucsd
|
|
ultraseek
|
|
^undefined$
|
|
^unknown$
|
|
Unpaywall
|
|
URL2File
|
|
urlaliasbuilder
|
|
urllib
|
|
^user.?agent$
|
|
^User-Agent
|
|
validator
|
|
virus.detector
|
|
voila
|
|
^voltron$
|
|
voyager\/
|
|
w3af\.org
|
|
Wanadoo
|
|
Web(\s|\+)Downloader
|
|
WebCloner
|
|
webcollage
|
|
WebCopier
|
|
Webinator
|
|
weblayers
|
|
Webmetrics
|
|
webmirror
|
|
webmon
|
|
weborama-fetcher
|
|
webreaper
|
|
WebStripper
|
|
WebZIP
|
|
Wget
|
|
WhatsApp
|
|
wordpress
|
|
worm
|
|
www\.gnip\.com
|
|
WWW-Mechanize
|
|
xenu
|
|
y!j
|
|
yacy
|
|
yahoo
|
|
yandex
|
|
Yeti\/\d
|
|
zeus
|
|
zyborg
|
|
7siters
|