Robots, arañas de buscadores - EJEMPLOS
27 de Diciembre de 2004
HTML, Programación web
Todos los robots van al dominio
Código:
User-agent: *
Disallow:
Para que no indexen nada
Código:
User-agent: *
Disallow: /
Para evitar que se indexen los directorios "imagenes" y "cgi-bin"
Código:
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Para evitar lo de los robots registrados hasta ahora
Código:
User-agent: Mozilla/3.0 (compatible;miner;mailto:miner@miner.com.br)
Disallow:
User-agent: WebFerret
Disallow:
User-agent: Due to a deficiency in Java it's not currently possible
to set the User-agent.
Disallow:
User-agent: no
Disallow:
User-agent: 'Ahoy! The Homepage Finder'
Disallow:
User-agent: Arachnophilia
Disallow:
User-agent: ArchitextSpider
Disallow:
User-agent: ASpider/0.09
Disallow:
User-agent: AURESYS/1.0
Disallow:
User-agent: BackRub/*.*
Disallow:
User-agent: Big Brother
Disallow:
User-agent: BlackWidow
Disallow:
User-agent: BSpider/1.0 libwww-perl/0.40
Disallow:
User-agent: CACTVS Chemistry Spider
Disallow:
User-agent: Digimarc CGIReader/1.0
Disallow:
User-agent: Checkbot/x.xx LWP/5.x
Disallow:
User-agent: CMC/0.01
Disallow:
User-agent: combine/0.0
Disallow:
User-agent: conceptbot/0.3
Disallow:
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow:
User-agent: root/0.1
Disallow:
User-agent: CS-HKUST-IndexServer/1.0
Disallow:
User-agent: CyberSpyder/2.1
Disallow:
User-agent: Deweb/1.01
Disallow:
User-agent: DragonBot/1.0 libwww/5.0
Disallow:
User-agent: EIT-Link-Verifier-Robot/0.2
Disallow:
User-agent: Emacs-w3/v[0-9.]+
Disallow:
User-agent: EmailSiphon
Disallow:
User-agent: EMC Spider
Disallow:
User-agent: explorersearch
Disallow:
User-agent: Explorer
Disallow:
User-agent: ExtractorPro
Disallow:
User-agent: FelixIDE/1.0
Disallow:
User-agent: Hazel's Ferret Web hopper,
Disallow:
User-agent: ESIRover v1.0
Disallow:
User-agent: fido/0.9 Harvest/1.4.pl2
Disallow:
User-agent: Hämähäkki/0.2
Disallow:
User-agent: KIT-Fireball/2.0 libwww/5.0a
Disallow:
User-agent: Fish-Search-Robot
Disallow:
User-agent: Mozilla/2.0 (compatible fouineur v2.0;
fouineur.9bit.qc.ca)
Disallow:
User-agent: Robot du CRIM 1.0a
Disallow:
User-agent: Freecrawl
Disallow:
User-agent: FunnelWeb-1.0
Disallow:
User-agent: gcreep/1.0
Disallow:
User-agent: ???
Disallow:
User-agent: GetURL.rexx v1.05
Disallow:
User-agent: Golem/1.1
Disallow:
User-agent: Gromit/1.0
Disallow:
User-agent: Gulliver/1.1
Disallow:
User-agent: yes
Disallow:
User-agent: AITCSRobot/1.1
Disallow:
User-agent: wired-digital-newsbot/1.5
Disallow:
User-agent: htdig/3.0b3
Disallow:
User-agent: HTMLgobble v2.2
Disallow:
User-agent: no
Disallow:
User-agent: IBM_Planetwide,
Disallow:
User-agent: gestaltIconoclast/1.0 libwww-FM/2.17
Disallow:
User-agent: INGRID/0.1
Disallow:
User-agent: IncyWincy/1.0b1
Disallow:
User-agent: Informant
Disallow:
User-agent: InfoSeek Robot 1.0
Disallow:
User-agent: Infoseek Sidewinder
Disallow:
User-agent: InfoSpiders/0.1
Disallow:
User-agent: inspectorwww/1.0
http://www.greenpac.com/inspectorwww.html---Disallow∞:
User-agent: 'IAGENT/1.0'
Disallow:
User-agent: IsraeliSearch/1.0
Disallow:
User-agent: JCrawler/0.2
Disallow:
User-agent: Jeeves v0.05alpha (PERL, LWP, lglb@doc.ic.ac.uk∞)
Disallow:
User-agent: Jobot/0.1alpha libwww-perl/4.0
Disallow:
User-agent: JoeBot,
Disallow:
User-agent: JubiiRobot
Disallow:
User-agent: jumpstation
Disallow:
User-agent: Katipo/1.0
Disallow:
User-agent: KDD-Explorer/0.1
Disallow:
User-agent: KO_Yappo_Robot/1.0.4(http://yappo.com/info/robot.html∞)
Disallow:
User-agent: LabelGrab/1.1
Disallow:
User-agent: LinkWalker
Disallow:
User-agent: logo.gif crawler
Disallow:
User-agent: Lycos/x.x
Disallow:
User-agent: Lycos_Spider_(T-Rex)
Disallow:
User-agent: Magpie/1.0
Disallow:
User-agent: MediaFox/x.y
Disallow:
User-agent: MerzScope
Disallow:
User-agent: NEC-MeshExplorer
Disallow:
User-agent: MOMspider/1.00 libwww-perl/0.40
Disallow:
User-agent: Monster/vX.X.X -$TYPE ($OSTYPE)
Disallow:
User-agent: Motor/0.2
Disallow:
User-agent: MuscatFerret
Disallow:
User-agent: MwdSearch/0.1
Disallow:
User-agent: NetCarta CyberPilot Pro
Disallow:
User-agent: NetMechanic
Disallow:
User-agent: NetScoop/1.0 libwww/5.0a
Disallow:
User-agent: NHSEWalker/3.0
Disallow:
User-agent: Nomad-V2.x
Disallow:
User-agent: NorthStar
Disallow:
User-agent: Occam/1.0
Disallow:
User-agent: HKU WWW Robot,
Disallow:
User-agent: Orbsearch/1.0
Disallow:
User-agent: PackRat/1.0
Disallow:
User-agent: Patric/0.01a
Disallow:
User-agent: Peregrinator-Mathematics/0.7
Disallow:
User-agent: Duppies
Disallow:
User-agent: Pioneer
Disallow:
User-agent: PGP-KA/1.2
Disallow:
User-agent: Resume Robot
Disallow:
User-agent: Road Runner: ImageScape Robot (lim@cs.leidenuniv.nl∞)
Disallow:
User-agent: Robbie/0.1
Disallow:
User-agent: ComputingSite Robi/1.0 (robi@computingsite.com∞)
Disallow:
User-agent: Roverbot
Disallow:
User-agent: SafetyNet Robot 0.1,
Disallow:
User-agent: Scooter/1.0
Disallow:
User-agent: not available
Disallow:
User-agent: Senrigan/xxxxxx
Disallow:
User-agent: SG-Scout
Disallow:
User-agent: Shai'Hulud
Disallow:
User-agent: SimBot/1.0
Disallow:
User-agent: Open Text Site Crawler V1.0
Disallow:
User-agent: SiteTech-Rover
Disallow:
User-agent: Slurp/2.0
Disallow:
User-agent: ESISmartSpider/2.0
Disallow:
User-agent: Snooper/b97_01
Disallow:
User-agent: Solbot/1.0 LWP/5.07
Disallow:
User-agent: Spanner/1.0 (Linux 2.0.27 i586)
Disallow:
User-agent: no
Disallow:
User-agent: Mozilla/3.0 (Black Widow v1.1.0; Linux 2.0.27; Dec 31
1997 12:25:00
Disallow:
User-agent: Tarantula/1.0
Disallow:
User-agent: tarspider
Disallow:
User-agent: dlw3robot/x.y (in TclX by http://hplyot.obspm.fr/~dl/∞)
Disallow:
User-agent: Templeton/
Disallow:
User-agent: TitIn/0.2
Disallow:
User-agent: TITAN/0.1
Disallow:
User-agent: UCSD-Crawler
Disallow:
User-agent: urlck/1.2.3
Disallow:
User-agent: Valkyrie/1.0 libwww-perl/0.40
Disallow:
User-agent: Victoria/1.0
Disallow:
User-agent: vision-search/3.0'
Disallow:
User-agent: VWbot_K/4.2
Disallow:
User-agent: w3index
Disallow:
User-agent: W3M2/x.xxx
Disallow:
User-agent: WWWWanderer v3.0
Disallow:
User-agent: WebCopy/
Disallow:
User-agent: WebCrawler/3.0 Robot libwww/5.0a
Disallow:
User-agent: WebFetcher/0.8,
Disallow:
User-agent: weblayers/0.0
Disallow:
User-agent: WebLinker/0.0 libwww-perl/0.1
Disallow:
User-agent: no
Disallow:
User-agent: WebMoose/0.0.0000
Disallow:
User-agent: Digimarc WebReader/1.2
Disallow:
User-agent: webs@recruit.co.jp∞
Disallow:
User-agent: webvac/1.0
Disallow:
User-agent: webwalk
Disallow:
User-agent: WebWalker/1.10
Disallow:
User-agent: WebWatch
Disallow:
User-agent: Wget/1.4.0
Disallow:
User-agent: w3mir
Disallow:
User-agent: no
Disallow:
User-agent: WWWC/0.25 (Win95)
Disallow:
User-agent: none
Disallow:
User-agent: XGET/0.7
Disallow:
User-agent: Nederland.zoek
Disallow:
User-agent: BizBot04 kirk.overleaf.com
Disallow:
User-agent: HappyBot (gserver.kw.net)
Disallow:
User-agent: CaliforniaBrownSpider
Disallow:
User-agent: EI*Net/0.1 libwww/0.1
Disallow:
User-agent: Ibot/1.0 libwww-perl/0.40
Disallow:
User-agent: Merritt/1.0
Disallow:
User-agent: StatFetcher/1.0
Disallow:
User-agent: TeacherSoft/1.0 libwww/2.17
Disallow:
User-agent: WWW Collector
Disallow:
User-agent: processor/0.0ALPHA libwww-perl/0.20
Disallow:
User-agent: wobot/1.0 from 206.214.202.45
Disallow:
User-agent: Libertech-Rover www.libertech.com∞?
Disallow:
User-agent: WhoWhere Robot
Disallow:
User-agent: ITI Spider
Disallow:
User-agent: w3index
Disallow:
User-agent: MyCNNSpider
Disallow:
User-agent: SummyCrawler
Disallow:
User-agent: OGspider
Disallow:
User-agent: linklooker
Disallow:
User-agent: CyberSpyder (amant@www.cyberspyder.com∞)
Disallow:
User-agent: SlowBot
Disallow:
User-agent: heraSpider
Disallow:
User-agent: Surfbot
Disallow:
User-agent: Bizbot003
Disallow:
User-agent: WebWalker
Disallow:
User-agent: SandBot
Disallow:
User-agent: EnigmaBot
Disallow:
User-agent: spyder3.microsys.com
Disallow:
User-agent: www.freeloader.com∞.
Disallow:
User-agent: Googlebot
Disallow:
User-agent: METAGOPHER
Disallow:
Código:
User-agent: *
Disallow:
Para que no indexen nada
Código:
User-agent: *
Disallow: /
Para evitar que se indexen los directorios "imagenes" y "cgi-bin"
Código:
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Para evitar lo de los robots registrados hasta ahora
Código:
User-agent: Mozilla/3.0 (compatible;miner;mailto:miner@miner.com.br)
Disallow:
User-agent: WebFerret
Disallow:
User-agent: Due to a deficiency in Java it's not currently possible
to set the User-agent.
Disallow:
User-agent: no
Disallow:
User-agent: 'Ahoy! The Homepage Finder'
Disallow:
User-agent: Arachnophilia
Disallow:
User-agent: ArchitextSpider
Disallow:
User-agent: ASpider/0.09
Disallow:
User-agent: AURESYS/1.0
Disallow:
User-agent: BackRub/*.*
Disallow:
User-agent: Big Brother
Disallow:
User-agent: BlackWidow
Disallow:
User-agent: BSpider/1.0 libwww-perl/0.40
Disallow:
User-agent: CACTVS Chemistry Spider
Disallow:
User-agent: Digimarc CGIReader/1.0
Disallow:
User-agent: Checkbot/x.xx LWP/5.x
Disallow:
User-agent: CMC/0.01
Disallow:
User-agent: combine/0.0
Disallow:
User-agent: conceptbot/0.3
Disallow:
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow:
User-agent: root/0.1
Disallow:
User-agent: CS-HKUST-IndexServer/1.0
Disallow:
User-agent: CyberSpyder/2.1
Disallow:
User-agent: Deweb/1.01
Disallow:
User-agent: DragonBot/1.0 libwww/5.0
Disallow:
User-agent: EIT-Link-Verifier-Robot/0.2
Disallow:
User-agent: Emacs-w3/v[0-9.]+
Disallow:
User-agent: EmailSiphon
Disallow:
User-agent: EMC Spider
Disallow:
User-agent: explorersearch
Disallow:
User-agent: Explorer
Disallow:
User-agent: ExtractorPro
Disallow:
User-agent: FelixIDE/1.0
Disallow:
User-agent: Hazel's Ferret Web hopper,
Disallow:
User-agent: ESIRover v1.0
Disallow:
User-agent: fido/0.9 Harvest/1.4.pl2
Disallow:
User-agent: Hämähäkki/0.2
Disallow:
User-agent: KIT-Fireball/2.0 libwww/5.0a
Disallow:
User-agent: Fish-Search-Robot
Disallow:
User-agent: Mozilla/2.0 (compatible fouineur v2.0;
fouineur.9bit.qc.ca)
Disallow:
User-agent: Robot du CRIM 1.0a
Disallow:
User-agent: Freecrawl
Disallow:
User-agent: FunnelWeb-1.0
Disallow:
User-agent: gcreep/1.0
Disallow:
User-agent: ???
Disallow:
User-agent: GetURL.rexx v1.05
Disallow:
User-agent: Golem/1.1
Disallow:
User-agent: Gromit/1.0
Disallow:
User-agent: Gulliver/1.1
Disallow:
User-agent: yes
Disallow:
User-agent: AITCSRobot/1.1
Disallow:
User-agent: wired-digital-newsbot/1.5
Disallow:
User-agent: htdig/3.0b3
Disallow:
User-agent: HTMLgobble v2.2
Disallow:
User-agent: no
Disallow:
User-agent: IBM_Planetwide,
Disallow:
User-agent: gestaltIconoclast/1.0 libwww-FM/2.17
Disallow:
User-agent: INGRID/0.1
Disallow:
User-agent: IncyWincy/1.0b1
Disallow:
User-agent: Informant
Disallow:
User-agent: InfoSeek Robot 1.0
Disallow:
User-agent: Infoseek Sidewinder
Disallow:
User-agent: InfoSpiders/0.1
Disallow:
User-agent: inspectorwww/1.0
http://www.greenpac.com/inspectorwww.html---Disallow∞:
User-agent: 'IAGENT/1.0'
Disallow:
User-agent: IsraeliSearch/1.0
Disallow:
User-agent: JCrawler/0.2
Disallow:
User-agent: Jeeves v0.05alpha (PERL, LWP, lglb@doc.ic.ac.uk∞)
Disallow:
User-agent: Jobot/0.1alpha libwww-perl/4.0
Disallow:
User-agent: JoeBot,
Disallow:
User-agent: JubiiRobot
Disallow:
User-agent: jumpstation
Disallow:
User-agent: Katipo/1.0
Disallow:
User-agent: KDD-Explorer/0.1
Disallow:
User-agent: KO_Yappo_Robot/1.0.4(http://yappo.com/info/robot.html∞)
Disallow:
User-agent: LabelGrab/1.1
Disallow:
User-agent: LinkWalker
Disallow:
User-agent: logo.gif crawler
Disallow:
User-agent: Lycos/x.x
Disallow:
User-agent: Lycos_Spider_(T-Rex)
Disallow:
User-agent: Magpie/1.0
Disallow:
User-agent: MediaFox/x.y
Disallow:
User-agent: MerzScope
Disallow:
User-agent: NEC-MeshExplorer
Disallow:
User-agent: MOMspider/1.00 libwww-perl/0.40
Disallow:
User-agent: Monster/vX.X.X -$TYPE ($OSTYPE)
Disallow:
User-agent: Motor/0.2
Disallow:
User-agent: MuscatFerret
Disallow:
User-agent: MwdSearch/0.1
Disallow:
User-agent: NetCarta CyberPilot Pro
Disallow:
User-agent: NetMechanic
Disallow:
User-agent: NetScoop/1.0 libwww/5.0a
Disallow:
User-agent: NHSEWalker/3.0
Disallow:
User-agent: Nomad-V2.x
Disallow:
User-agent: NorthStar
Disallow:
User-agent: Occam/1.0
Disallow:
User-agent: HKU WWW Robot,
Disallow:
User-agent: Orbsearch/1.0
Disallow:
User-agent: PackRat/1.0
Disallow:
User-agent: Patric/0.01a
Disallow:
User-agent: Peregrinator-Mathematics/0.7
Disallow:
User-agent: Duppies
Disallow:
User-agent: Pioneer
Disallow:
User-agent: PGP-KA/1.2
Disallow:
User-agent: Resume Robot
Disallow:
User-agent: Road Runner: ImageScape Robot (lim@cs.leidenuniv.nl∞)
Disallow:
User-agent: Robbie/0.1
Disallow:
User-agent: ComputingSite Robi/1.0 (robi@computingsite.com∞)
Disallow:
User-agent: Roverbot
Disallow:
User-agent: SafetyNet Robot 0.1,
Disallow:
User-agent: Scooter/1.0
Disallow:
User-agent: not available
Disallow:
User-agent: Senrigan/xxxxxx
Disallow:
User-agent: SG-Scout
Disallow:
User-agent: Shai'Hulud
Disallow:
User-agent: SimBot/1.0
Disallow:
User-agent: Open Text Site Crawler V1.0
Disallow:
User-agent: SiteTech-Rover
Disallow:
User-agent: Slurp/2.0
Disallow:
User-agent: ESISmartSpider/2.0
Disallow:
User-agent: Snooper/b97_01
Disallow:
User-agent: Solbot/1.0 LWP/5.07
Disallow:
User-agent: Spanner/1.0 (Linux 2.0.27 i586)
Disallow:
User-agent: no
Disallow:
User-agent: Mozilla/3.0 (Black Widow v1.1.0; Linux 2.0.27; Dec 31
1997 12:25:00
Disallow:
User-agent: Tarantula/1.0
Disallow:
User-agent: tarspider
Disallow:
User-agent: dlw3robot/x.y (in TclX by http://hplyot.obspm.fr/~dl/∞)
Disallow:
User-agent: Templeton/
Disallow:
User-agent: TitIn/0.2
Disallow:
User-agent: TITAN/0.1
Disallow:
User-agent: UCSD-Crawler
Disallow:
User-agent: urlck/1.2.3
Disallow:
User-agent: Valkyrie/1.0 libwww-perl/0.40
Disallow:
User-agent: Victoria/1.0
Disallow:
User-agent: vision-search/3.0'
Disallow:
User-agent: VWbot_K/4.2
Disallow:
User-agent: w3index
Disallow:
User-agent: W3M2/x.xxx
Disallow:
User-agent: WWWWanderer v3.0
Disallow:
User-agent: WebCopy/
Disallow:
User-agent: WebCrawler/3.0 Robot libwww/5.0a
Disallow:
User-agent: WebFetcher/0.8,
Disallow:
User-agent: weblayers/0.0
Disallow:
User-agent: WebLinker/0.0 libwww-perl/0.1
Disallow:
User-agent: no
Disallow:
User-agent: WebMoose/0.0.0000
Disallow:
User-agent: Digimarc WebReader/1.2
Disallow:
User-agent: webs@recruit.co.jp∞
Disallow:
User-agent: webvac/1.0
Disallow:
User-agent: webwalk
Disallow:
User-agent: WebWalker/1.10
Disallow:
User-agent: WebWatch
Disallow:
User-agent: Wget/1.4.0
Disallow:
User-agent: w3mir
Disallow:
User-agent: no
Disallow:
User-agent: WWWC/0.25 (Win95)
Disallow:
User-agent: none
Disallow:
User-agent: XGET/0.7
Disallow:
User-agent: Nederland.zoek
Disallow:
User-agent: BizBot04 kirk.overleaf.com
Disallow:
User-agent: HappyBot (gserver.kw.net)
Disallow:
User-agent: CaliforniaBrownSpider
Disallow:
User-agent: EI*Net/0.1 libwww/0.1
Disallow:
User-agent: Ibot/1.0 libwww-perl/0.40
Disallow:
User-agent: Merritt/1.0
Disallow:
User-agent: StatFetcher/1.0
Disallow:
User-agent: TeacherSoft/1.0 libwww/2.17
Disallow:
User-agent: WWW Collector
Disallow:
User-agent: processor/0.0ALPHA libwww-perl/0.20
Disallow:
User-agent: wobot/1.0 from 206.214.202.45
Disallow:
User-agent: Libertech-Rover www.libertech.com∞?
Disallow:
User-agent: WhoWhere Robot
Disallow:
User-agent: ITI Spider
Disallow:
User-agent: w3index
Disallow:
User-agent: MyCNNSpider
Disallow:
User-agent: SummyCrawler
Disallow:
User-agent: OGspider
Disallow:
User-agent: linklooker
Disallow:
User-agent: CyberSpyder (amant@www.cyberspyder.com∞)
Disallow:
User-agent: SlowBot
Disallow:
User-agent: heraSpider
Disallow:
User-agent: Surfbot
Disallow:
User-agent: Bizbot003
Disallow:
User-agent: WebWalker
Disallow:
User-agent: SandBot
Disallow:
User-agent: EnigmaBot
Disallow:
User-agent: spyder3.microsys.com
Disallow:
User-agent: www.freeloader.com∞.
Disallow:
User-agent: Googlebot
Disallow:
User-agent: METAGOPHER
Disallow:
Valora este capítulo:
Autor y licencia de 'Robots, arañas de buscadores - EJEMPLOS'
|
Opiniona sobre 'Robots, arañas de buscadores - EJEMPLOS' (1)
Tu nombre debe tener tres caracteres como mínimo.
Es necesario que te des de alta con una cuenta de correo válida.
Es necesario que te des de alta con una cuenta de correo válida.
El contenido del título de tu opinión debe tener tres caracteres como mínimo.
Es obligatorio que selecciones una valoración del recurso.
El contenido del comentario de tu opinión debe tener tres caracteres como mínimo.
Opina sobre este tutorial |
Wikis relacionados con 'Robots, arañas de buscadores - EJEMPLOS'
La tipología del ejemplo lexicográfico posee ya una abundante bibliografía. Los detractores y defensores de...
Más »
A través del humor y de los comentarios refractados, la película La ley del deseo...
Más »
Los buscadores son grandes aliados en la promoción de sitios web: atraen tráfico, la mayoría...
Más »
Hypermedia: Ventajas, Ejemplos.
A continuación reproduzco una lista de buscadores facilitada por Choche en un hilo del foro...
Más »


