본문으로 바로가기
336x280(권장), 300x250(권장), 250x250, 200x200 크기의 광고 코드만 넣을 수 있습니다.


robots.txt



robots.txt 파일은 웹 크롤러가 해당 페이지의 접근을 제어하기 위한 하나의 약속입니다.

robots.txt파일은 사이트의 최상위인 Root(/)에 위치해야 하며 로봇 배제 표준 프로토콜을 이용하여 섹션별, 웹 크롤러의 종류(데스크탑, 모바일, 구글)로 사이트에 대한 접근을 제어합니다.


작성법


robots.txt의 위치 위에서 설명했다시피 최상위읜 Root에 robots.txt에 있어야합니다. 즉 하나의 최상단의 URL ktko.tistory.com이 있다면 ktko.tistory.com/robots.txt을 호출했을 때 robots.txt의 파일 내용이 보여져야 합니다.


User-agent:검색봇 이름

Disallow:접근 설정

Crawl-delay:다음방문까지의 디레이(초)


모든 봇을 허용


User-agent: *

Disallow: 


구글봇을 제외한 나머지는 차단


User-agent: Googlebot

Disallow:

 

User-agent: *

Disallow: /


모든 봇을 차단


User-agent: *

Disallow: / 


크롤러의 예제


네이버 


URL : https://www.naver.com/robots.txt 


User-agent: *

Disallow: /

Allow : /$ 


구글


URL : https://www.google.co.kr/robots.txt


User-agent: *

Disallow: /search

Allow: /search/about

Allow: /search/static

Allow: /search/howsearchworks

Disallow: /sdch

Disallow: /groups

Disallow: /index.html?

Disallow: /?

Allow: /?hl=

Disallow: /?hl=*&

Allow: /?hl=*&gws_rd=ssl$

Disallow: /?hl=*&*&gws_rd=ssl

Allow: /?gws_rd=ssl$

Allow: /?pt1=true$

Disallow: /imgres

Disallow: /u/

Disallow: /preferences

Disallow: /setprefs

Disallow: /default

Disallow: /m?

Disallow: /m/

Allow:    /m/finance

Disallow: /wml?

Disallow: /wml/?

Disallow: /wml/search?

Disallow: /xhtml?

Disallow: /xhtml/?

Disallow: /xhtml/search?

Disallow: /xml?

Disallow: /imode?

Disallow: /imode/?

Disallow: /imode/search?

Disallow: /jsky?

Disallow: /jsky/?

Disallow: /jsky/search?

Disallow: /pda?

Disallow: /pda/?

Disallow: /pda/search?

Disallow: /sprint_xhtml

Disallow: /sprint_wml

Disallow: /pqa

Disallow: /palm

Disallow: /gwt/

Disallow: /purchases

Disallow: /local?

Disallow: /local_url

Disallow: /shihui?

Disallow: /shihui/

Disallow: /products?

Disallow: /product_

Disallow: /products_

Disallow: /products;

Disallow: /print

Disallow: /books/

Disallow: /bkshp?*q=*

Disallow: /books?*q=*

Disallow: /books?*output=*

Disallow: /books?*pg=*

Disallow: /books?*jtp=*

Disallow: /books?*jscmd=*

Disallow: /books?*buy=*

Disallow: /books?*zoom=*

Allow: /books?*q=related:*

Allow: /books?*q=editions:*

Allow: /books?*q=subject:*

Allow: /books/about

Allow: /booksrightsholders

Allow: /books?*zoom=1*

Allow: /books?*zoom=5*

Allow: /books/content?*zoom=1*

Allow: /books/content?*zoom=5*

Disallow: /ebooks/

Disallow: /ebooks?*q=*

Disallow: /ebooks?*output=*

Disallow: /ebooks?*pg=*

Disallow: /ebooks?*jscmd=*

Disallow: /ebooks?*buy=*

Disallow: /ebooks?*zoom=*

Allow: /ebooks?*q=related:*

Allow: /ebooks?*q=editions:*

Allow: /ebooks?*q=subject:*

Allow: /ebooks?*zoom=1*

Allow: /ebooks?*zoom=5*

Disallow: /patents?

Disallow: /patents/download/

Disallow: /patents/pdf/

Disallow: /patents/related/

Disallow: /scholar

Disallow: /citations?

Allow: /citations?user=

Disallow: /citations?*cstart=

Allow: /citations?view_op=new_profile

Allow: /citations?view_op=top_venues

Allow: /scholar_share

Disallow: /s?

Allow: /maps?*output=classic*

Allow: /maps?*file=

Allow: /maps/api/js?

Allow: /maps/d/

Disallow: /maps?

Disallow: /mapstt?

Disallow: /mapslt?

Disallow: /maps/stk/

Disallow: /maps/br?

Disallow: /mapabcpoi?

Disallow: /maphp?

Disallow: /mapprint?

Disallow: /maps/api/js/

Disallow: /maps/api/staticmap?

Disallow: /maps/api/streetview

Disallow: /mld?

Disallow: /staticmap?

Disallow: /maps/preview

Disallow: /maps/place

Disallow: /maps/search/

Disallow: /maps/dir/

Disallow: /maps/timeline/

Disallow: /help/maps/streetview/partners/welcome/

Disallow: /help/maps/indoormaps/partners/

Disallow: /lochp?

Disallow: /center

Disallow: /ie?

Disallow: /blogsearch/

Disallow: /blogsearch_feeds

Disallow: /advanced_blog_search

Disallow: /uds/

Disallow: /chart?

Disallow: /transit?

Disallow: /extern_js/

Disallow: /xjs/

Disallow: /calendar/feeds/

Disallow: /calendar/ical/

Disallow: /cl2/feeds/

Disallow: /cl2/ical/

Disallow: /coop/directory

Disallow: /coop/manage

Disallow: /trends?

Disallow: /trends/music?

Disallow: /trends/hottrends?

Disallow: /trends/viz?

Disallow: /trends/embed.js?

Disallow: /trends/fetchComponent?

Disallow: /trends/beta

Disallow: /trends/topics

Disallow: /musica

Disallow: /musicad

Disallow: /musicas

Disallow: /musicl

Disallow: /musics

Disallow: /musicsearch

Disallow: /musicsp

Disallow: /musiclp

Disallow: /urchin_test/

Disallow: /movies?

Disallow: /wapsearch?

Allow: /safebrowsing/diagnostic

Allow: /safebrowsing/report_badware/

Allow: /safebrowsing/report_error/

Allow: /safebrowsing/report_phish/

Disallow: /reviews/search?

Disallow: /orkut/albums

Disallow: /cbk

Allow: /cbk?output=tile&cb_client=maps_sv

Disallow: /maps/api/js/AuthenticationService.Authenticate

Disallow: /maps/api/js/QuotaService.RecordEvent

Disallow: /recharge/dashboard/car

Disallow: /recharge/dashboard/static/

Disallow: /profiles/me

Allow: /profiles

Disallow: /s2/profiles/me

Allow: /s2/profiles

Allow: /s2/oz

Allow: /s2/photos

Allow: /s2/search/social

Allow: /s2/static

Disallow: /s2

Disallow: /transconsole/portal/

Disallow: /gcc/

Disallow: /aclk

Disallow: /cse?

Disallow: /cse/home

Disallow: /cse/panel

Disallow: /cse/manage

Disallow: /tbproxy/

Disallow: /imesync/

Disallow: /shenghuo/search?

Disallow: /support/forum/search?

Disallow: /reviews/polls/

Disallow: /hosted/images/

Disallow: /ppob/?

Disallow: /ppob?

Disallow: /accounts/ClientLogin

Disallow: /accounts/ClientAuth

Disallow: /accounts/o8

Allow: /accounts/o8/id

Disallow: /topicsearch?q=

Disallow: /xfx7/

Disallow: /squared/api

Disallow: /squared/search

Disallow: /squared/table

Disallow: /qnasearch?

Disallow: /app/updates

Disallow: /sidewiki/entry/

Disallow: /quality_form?

Disallow: /labs/popgadget/search

Disallow: /buzz/post

Disallow: /compressiontest/

Disallow: /analytics/feeds/

Disallow: /analytics/partners/comments/

Disallow: /analytics/portal/

Disallow: /analytics/uploads/

Allow: /alerts/manage

Allow: /alerts/remove

Disallow: /alerts/

Allow: /alerts/$

Disallow: /ads/search?

Disallow: /ads/plan/action_plan?

Disallow: /ads/plan/api/

Disallow: /ads/hotels/partners

Disallow: /phone/compare/?

Disallow: /travel/clk

Disallow: /hotelfinder/rpc

Disallow: /hotels/rpc

Disallow: /flights/rpc

Disallow: /async/flights/

Disallow: /commercesearch/services/

Disallow: /evaluation/

Disallow: /chrome/browser/mobile/tour

Disallow: /compare/*/apply*

Disallow: /forms/perks/

Disallow: /shopping/suppliers/search

Disallow: /ct/

Disallow: /edu/cs4hs/

Disallow: /trustedstores/s/

Disallow: /trustedstores/tm2

Disallow: /trustedstores/verify

Disallow: /adwords/proposal

Disallow: /shopping/product/

Disallow: /shopping/seller

Disallow: /shopping/reviewer

Disallow: /about/careers/applications/

Disallow: /landing/signout.html

Disallow: /webmasters/sitemaps/ping?

Disallow: /ping?

Disallow: /gallery/

Disallow: /landing/now/ontap/

Allow: /searchhistory/

Allow: /maps/reserve

Allow: /maps/reserve/partners

Disallow: /maps/reserve/api/

Disallow: /maps/reserve/search

Disallow: /maps/reserve/bookings

Disallow: /maps/reserve/settings

Disallow: /maps/reserve/manage

Disallow: /maps/reserve/payment

Disallow: /maps/reserve/receipt

Disallow: /maps/reserve/sellersignup

Disallow: /maps/reserve/payments

Disallow: /maps/reserve/feedback

Disallow: /maps/reserve/terms

Disallow: /maps/reserve/m/

Disallow: /maps/reserve/b/

Disallow: /maps/reserve/partner-dashboard

Disallow: /about/views/

Disallow: /intl/*/about/views/

Disallow: /local/dining/

Disallow: /local/place/products/

Disallow: /local/place/reviews/

Disallow: /local/place/rap/

Disallow: /local/tab/

Disallow: /travel/hotels/

Allow: /finance

Allow: /js/

Disallow: /finance?*q=*


# Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.

User-agent: Twitterbot

Allow: /imgres


User-agent: facebookexternalhit

Allow: /imgres


Sitemap: http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml

Sitemap: https://www.google.com/sitemap.xml 


약속을 지키지 못하면 ?


robots.txt파일에 있는 정보를 통해 크롤링을 할 수있는 크롤러 또는 봇과 특정 URL을 크롤링을 해도될지 안해도 될지 알 수 있습니다.

하지만 위에서 말했다 시피 약속한 내용을 지킬 필요는 없지만 disallow한 URL에 대해서 크롤링한 정보를 다른 용도로 사용하다가 법적으로 처벌을 받을 수 있으니 조심해서 크롤링을 해야합니다.