Inicio / Wikis / Tutoriales / Robots, arañas de buscadores - EJEMPLOS

Robots, arañas de buscadores - EJEMPLOS

(1 opiniones)
Tutorial creado por
27 de Diciembre de 2004
HTMLProgramación web
Todos los robots van al dominio

Código:

User-agent: *
Disallow:

Para que no indexen nada

Código:

User-agent: *
Disallow: /

Para evitar que se indexen los directorios "imagenes" y "cgi-bin"

Código:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/

Para evitar lo de los robots registrados hasta ahora

Código:

User-agent: Mozilla/3.0 (compatible;miner;mailto:miner@miner.com.br)
Disallow:

User-agent: WebFerret
Disallow:

User-agent: Due to a deficiency in Java it's not currently possible
to set the User-agent.
Disallow:

User-agent: no
Disallow:

User-agent: 'Ahoy! The Homepage Finder'
Disallow:

User-agent: Arachnophilia
Disallow:

User-agent: ArchitextSpider
Disallow:

User-agent: ASpider/0.09
Disallow:

User-agent: AURESYS/1.0
Disallow:

User-agent: BackRub/*.*
Disallow:

User-agent: Big Brother
Disallow:

User-agent: BlackWidow
Disallow:

User-agent: BSpider/1.0 libwww-perl/0.40
Disallow:

User-agent: CACTVS Chemistry Spider
Disallow:

User-agent: Digimarc CGIReader/1.0
Disallow:

User-agent: Checkbot/x.xx LWP/5.x
Disallow:

User-agent: CMC/0.01
Disallow:

User-agent: combine/0.0
Disallow:

User-agent: conceptbot/0.3
Disallow:

User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow:

User-agent: root/0.1
Disallow:

User-agent: CS-HKUST-IndexServer/1.0
Disallow:

User-agent: CyberSpyder/2.1
Disallow:

User-agent: Deweb/1.01
Disallow:

User-agent: DragonBot/1.0 libwww/5.0
Disallow:

User-agent: EIT-Link-Verifier-Robot/0.2
Disallow:

User-agent: Emacs-w3/v[0-9.]+
Disallow:

User-agent: EmailSiphon
Disallow:

User-agent: EMC Spider
Disallow:

User-agent: explorersearch
Disallow:

User-agent: Explorer
Disallow:

User-agent: ExtractorPro
Disallow:

User-agent: FelixIDE/1.0
Disallow:

User-agent: Hazel's Ferret Web hopper,
Disallow:

User-agent: ESIRover v1.0
Disallow:

User-agent: fido/0.9 Harvest/1.4.pl2
Disallow:

User-agent: Hämähäkki/0.2
Disallow:

User-agent: KIT-Fireball/2.0 libwww/5.0a
Disallow:

User-agent: Fish-Search-Robot
Disallow:

User-agent: Mozilla/2.0 (compatible fouineur v2.0;
fouineur.9bit.qc.ca)
Disallow:

User-agent: Robot du CRIM 1.0a
Disallow:

User-agent: Freecrawl
Disallow:

User-agent: FunnelWeb-1.0
Disallow:

User-agent: gcreep/1.0
Disallow:

User-agent: ???
Disallow:

User-agent: GetURL.rexx v1.05
Disallow:

User-agent: Golem/1.1
Disallow:

User-agent: Gromit/1.0
Disallow:

User-agent: Gulliver/1.1
Disallow:

User-agent: yes
Disallow:

User-agent: AITCSRobot/1.1
Disallow:

User-agent: wired-digital-newsbot/1.5
Disallow:

User-agent: htdig/3.0b3
Disallow:

User-agent: HTMLgobble v2.2
Disallow:

User-agent: no
Disallow:

User-agent: IBM_Planetwide,
Disallow:

User-agent: gestaltIconoclast/1.0 libwww-FM/2.17
Disallow:

User-agent: INGRID/0.1
Disallow:

User-agent: IncyWincy/1.0b1
Disallow:

User-agent: Informant
Disallow:

User-agent: InfoSeek Robot 1.0
Disallow:

User-agent: Infoseek Sidewinder
Disallow:

User-agent: InfoSpiders/0.1
Disallow:

User-agent: inspectorwww/1.0
http://www.greenpac.com/inspectorwww.html---Disallow:

User-agent: 'IAGENT/1.0'
Disallow:

User-agent: IsraeliSearch/1.0
Disallow:

User-agent: JCrawler/0.2
Disallow:

User-agent: Jeeves v0.05alpha (PERL, LWP, lglb@doc.ic.ac.uk)
Disallow:

User-agent: Jobot/0.1alpha libwww-perl/4.0
Disallow:

User-agent: JoeBot,
Disallow:

User-agent: JubiiRobot
Disallow:

User-agent: jumpstation
Disallow:

User-agent: Katipo/1.0
Disallow:

User-agent: KDD-Explorer/0.1
Disallow:

User-agent: KO_Yappo_Robot/1.0.4(http://yappo.com/info/robot.html)
Disallow:

User-agent: LabelGrab/1.1
Disallow:

User-agent: LinkWalker
Disallow:

User-agent: logo.gif crawler
Disallow:

User-agent: Lycos/x.x
Disallow:

User-agent: Lycos_Spider_(T-Rex)
Disallow:

User-agent: Magpie/1.0
Disallow:

User-agent: MediaFox/x.y
Disallow:

User-agent: MerzScope
Disallow:

User-agent: NEC-MeshExplorer
Disallow:

User-agent: MOMspider/1.00 libwww-perl/0.40
Disallow:

User-agent: Monster/vX.X.X -$TYPE ($OSTYPE)
Disallow:

User-agent: Motor/0.2
Disallow:

User-agent: MuscatFerret
Disallow:

User-agent: MwdSearch/0.1
Disallow:

User-agent: NetCarta CyberPilot Pro
Disallow:

User-agent: NetMechanic
Disallow:

User-agent: NetScoop/1.0 libwww/5.0a
Disallow:

User-agent: NHSEWalker/3.0
Disallow:

User-agent: Nomad-V2.x
Disallow:

User-agent: NorthStar
Disallow:

User-agent: Occam/1.0
Disallow:

User-agent: HKU WWW Robot,
Disallow:

User-agent: Orbsearch/1.0
Disallow:

User-agent: PackRat/1.0
Disallow:

User-agent: Patric/0.01a
Disallow:

User-agent: Peregrinator-Mathematics/0.7
Disallow:

User-agent: Duppies
Disallow:

User-agent: Pioneer
Disallow:

User-agent: PGP-KA/1.2
Disallow:

User-agent: Resume Robot
Disallow:

User-agent: Road Runner: ImageScape Robot (lim@cs.leidenuniv.nl)
Disallow:

User-agent: Robbie/0.1
Disallow:

User-agent: ComputingSite Robi/1.0 (robi@computingsite.com)
Disallow:

User-agent: Roverbot
Disallow:

User-agent: SafetyNet Robot 0.1,
Disallow:

User-agent: Scooter/1.0
Disallow:

User-agent: not available
Disallow:

User-agent: Senrigan/xxxxxx
Disallow:

User-agent: SG-Scout
Disallow:

User-agent: Shai'Hulud
Disallow:

User-agent: SimBot/1.0
Disallow:

User-agent: Open Text Site Crawler V1.0
Disallow:

User-agent: SiteTech-Rover
Disallow:

User-agent: Slurp/2.0
Disallow:

User-agent: ESISmartSpider/2.0
Disallow:

User-agent: Snooper/b97_01
Disallow:

User-agent: Solbot/1.0 LWP/5.07
Disallow:

User-agent: Spanner/1.0 (Linux 2.0.27 i586)
Disallow:

User-agent: no
Disallow:

User-agent: Mozilla/3.0 (Black Widow v1.1.0; Linux 2.0.27; Dec 31
1997 12:25:00
Disallow:

User-agent: Tarantula/1.0
Disallow:

User-agent: tarspider
Disallow:

User-agent: dlw3robot/x.y (in TclX by http://hplyot.obspm.fr/~dl/)
Disallow:

User-agent: Templeton/
Disallow:

User-agent: TitIn/0.2
Disallow:

User-agent: TITAN/0.1
Disallow:

User-agent: UCSD-Crawler
Disallow:

User-agent: urlck/1.2.3
Disallow:

User-agent: Valkyrie/1.0 libwww-perl/0.40
Disallow:

User-agent: Victoria/1.0
Disallow:

User-agent: vision-search/3.0'
Disallow:

User-agent: VWbot_K/4.2
Disallow:

User-agent: w3index
Disallow:

User-agent: W3M2/x.xxx
Disallow:

User-agent: WWWWanderer v3.0
Disallow:

User-agent: WebCopy/
Disallow:

User-agent: WebCrawler/3.0 Robot libwww/5.0a
Disallow:

User-agent: WebFetcher/0.8,
Disallow:

User-agent: weblayers/0.0
Disallow:

User-agent: WebLinker/0.0 libwww-perl/0.1
Disallow:

User-agent: no
Disallow:

User-agent: WebMoose/0.0.0000
Disallow:

User-agent: Digimarc WebReader/1.2
Disallow:

User-agent: webs@recruit.co.jp
Disallow:

User-agent: webvac/1.0
Disallow:

User-agent: webwalk
Disallow:

User-agent: WebWalker/1.10
Disallow:

User-agent: WebWatch
Disallow:

User-agent: Wget/1.4.0
Disallow:

User-agent: w3mir
Disallow:

User-agent: no
Disallow:

User-agent: WWWC/0.25 (Win95)
Disallow:

User-agent: none
Disallow:

User-agent: XGET/0.7
Disallow:

User-agent: Nederland.zoek
Disallow:

User-agent: BizBot04 kirk.overleaf.com
Disallow:

User-agent: HappyBot (gserver.kw.net)
Disallow:

User-agent: CaliforniaBrownSpider
Disallow:

User-agent: EI*Net/0.1 libwww/0.1
Disallow:

User-agent: Ibot/1.0 libwww-perl/0.40
Disallow:

User-agent: Merritt/1.0
Disallow:

User-agent: StatFetcher/1.0
Disallow:

User-agent: TeacherSoft/1.0 libwww/2.17
Disallow:

User-agent: WWW Collector
Disallow:

User-agent: processor/0.0ALPHA libwww-perl/0.20
Disallow:

User-agent: wobot/1.0 from 206.214.202.45
Disallow:

User-agent: Libertech-Rover www.libertech.com?
Disallow:

User-agent: WhoWhere Robot
Disallow:

User-agent: ITI Spider
Disallow:

User-agent: w3index
Disallow:

User-agent: MyCNNSpider
Disallow:

User-agent: SummyCrawler
Disallow:

User-agent: OGspider
Disallow:

User-agent: linklooker
Disallow:

User-agent: CyberSpyder (amant@www.cyberspyder.com)
Disallow:

User-agent: SlowBot
Disallow:

User-agent: heraSpider
Disallow:

User-agent: Surfbot
Disallow:

User-agent: Bizbot003
Disallow:

User-agent: WebWalker
Disallow:

User-agent: SandBot
Disallow:

User-agent: EnigmaBot
Disallow:

User-agent: spyder3.microsys.com
Disallow:

User-agent: www.freeloader.com.
Disallow:

User-agent: Googlebot
Disallow:

User-agent: METAGOPHER
Disallow:
Valora este capítulo: (1 opiniones)
Autor y licencia de 'Robots, arañas de buscadores - EJEMPLOS'
Azielito Extraído de: http://foro.elhacker.net/index.php/topic,40581.0.html

Creative Commons License
Esta obra está bajo una licencia de Creative Commons.
Este contenido ha sido recopilado por el equipo de Wikilearning. Todo el contenido recopilado se ha obtenido respetando y comunicando en nuestro site la licencia de cada fuente.
Wikilearning tiene permiso expreso por escrito de los autores para publicar los contenidos que ha extraído de otras webs, incluyendo su uso comercial.

Opiniona sobre 'Robots, arañas de buscadores - EJEMPLOS' (1)

Tu nombre debe tener tres caracteres como mínimo.
Es necesario que te des de alta con una cuenta de correo válida.
Es necesario que te des de alta con una cuenta de correo válida.
El contenido del título de tu opinión debe tener tres caracteres como mínimo.
Es obligatorio que selecciones una valoración del recurso.
El contenido del comentario de tu opinión debe tener tres caracteres como mínimo.

Opina sobre este tutorial



* Valoración:
* Nombre:
* Correo electrónico:
* Título:
* Comentario:

Wikis relacionados con 'Robots, arañas de buscadores - EJEMPLOS'

La tipología del ejemplo lexicográfico posee ya una abundante bibliografía. Los detractores y defensores de... Más »
A través del humor y de los comentarios refractados, la película La ley del deseo... Más »
Los buscadores son grandes aliados en la promoción de sitios web: atraen tráfico, la mayoría... Más »
Hypermedia: Ventajas, Ejemplos.
A continuación reproduzco una lista de buscadores facilitada por Choche en un hilo del foro... Más »
¿Estás seguro de que deseas eliminar este capítulo?