robots.txt 파일은 웹 크롤러가 해당 페이지의 접근을 제어하기 위한 하나의 약속입니다.
robots.txt파일은 사이트의 최상위인 Root(/)에 위치해야 하며 로봇 배제 표준 프로토콜을 이용하여 섹션별, 웹 크롤러의 종류(데스크탑, 모바일, 구글)로 사이트에 대한 접근을 제어합니다.
작성법
robots.txt의 위치 위에서 설명했다시피 최상위읜 Root에 robots.txt에 있어야합니다. 즉 하나의 최상단의 URL ktko.tistory.com이 있다면 ktko.tistory.com/robots.txt을 호출했을 때 robots.txt의 파일 내용이 보여져야 합니다.
User-agent:검색봇 이름 Disallow:접근 설정 Crawl-delay:다음방문까지의 디레이(초) |
모든 봇을 허용
User-agent: * Disallow: |
구글봇을 제외한 나머지는 차단
User-agent: Googlebot Disallow:
User-agent: * Disallow: / |
모든 봇을 차단
User-agent: * Disallow: / |
크롤러의 예제
네이버
URL : https://www.naver.com/robots.txt User-agent: * Disallow: / Allow : /$ |
구글
URL : https://www.google.co.kr/robots.txt User-agent: * Disallow: /search Allow: /search/about Allow: /search/static Allow: /search/howsearchworks Disallow: /sdch Disallow: /groups Disallow: /index.html? Disallow: /? Allow: /?hl= Disallow: /?hl=*& Allow: /?hl=*&gws_rd=ssl$ Disallow: /?hl=*&*&gws_rd=ssl Allow: /?gws_rd=ssl$ Allow: /?pt1=true$ Disallow: /imgres Disallow: /u/ Disallow: /preferences Disallow: /setprefs Disallow: /default Disallow: /m? Disallow: /m/ Allow: /m/finance Disallow: /wml? Disallow: /wml/? Disallow: /wml/search? Disallow: /xhtml? Disallow: /xhtml/? Disallow: /xhtml/search? Disallow: /xml? Disallow: /imode? Disallow: /imode/? Disallow: /imode/search? Disallow: /jsky? Disallow: /jsky/? Disallow: /jsky/search? Disallow: /pda? Disallow: /pda/? Disallow: /pda/search? Disallow: /sprint_xhtml Disallow: /sprint_wml Disallow: /pqa Disallow: /palm Disallow: /gwt/ Disallow: /purchases Disallow: /local? Disallow: /local_url Disallow: /shihui? Disallow: /shihui/ Disallow: /products? Disallow: /product_ Disallow: /products_ Disallow: /products; Disallow: /print Disallow: /books/ Disallow: /bkshp?*q=* Disallow: /books?*q=* Disallow: /books?*output=* Disallow: /books?*pg=* Disallow: /books?*jtp=* Disallow: /books?*jscmd=* Disallow: /books?*buy=* Disallow: /books?*zoom=* Allow: /books?*q=related:* Allow: /books?*q=editions:* Allow: /books?*q=subject:* Allow: /books/about Allow: /booksrightsholders Allow: /books?*zoom=1* Allow: /books?*zoom=5* Allow: /books/content?*zoom=1* Allow: /books/content?*zoom=5* Disallow: /ebooks/ Disallow: /ebooks?*q=* Disallow: /ebooks?*output=* Disallow: /ebooks?*pg=* Disallow: /ebooks?*jscmd=* Disallow: /ebooks?*buy=* Disallow: /ebooks?*zoom=* Allow: /ebooks?*q=related:* Allow: /ebooks?*q=editions:* Allow: /ebooks?*q=subject:* Allow: /ebooks?*zoom=1* Allow: /ebooks?*zoom=5* Disallow: /patents? Disallow: /patents/download/ Disallow: /patents/pdf/ Disallow: /patents/related/ Disallow: /scholar Disallow: /citations? Allow: /citations?user= Disallow: /citations?*cstart= Allow: /citations?view_op=new_profile Allow: /citations?view_op=top_venues Allow: /scholar_share Disallow: /s? Allow: /maps?*output=classic* Allow: /maps?*file= Allow: /maps/api/js? Allow: /maps/d/ Disallow: /maps? Disallow: /mapstt? Disallow: /mapslt? Disallow: /maps/stk/ Disallow: /maps/br? Disallow: /mapabcpoi? Disallow: /maphp? Disallow: /mapprint? Disallow: /maps/api/js/ Disallow: /maps/api/staticmap? Disallow: /maps/api/streetview Disallow: /mld? Disallow: /staticmap? Disallow: /maps/preview Disallow: /maps/place Disallow: /maps/search/ Disallow: /maps/dir/ Disallow: /maps/timeline/ Disallow: /help/maps/streetview/partners/welcome/ Disallow: /help/maps/indoormaps/partners/ Disallow: /lochp? Disallow: /center Disallow: /ie? Disallow: /blogsearch/ Disallow: /blogsearch_feeds Disallow: /advanced_blog_search Disallow: /uds/ Disallow: /chart? Disallow: /transit? Disallow: /extern_js/ Disallow: /xjs/ Disallow: /calendar/feeds/ Disallow: /calendar/ical/ Disallow: /cl2/feeds/ Disallow: /cl2/ical/ Disallow: /coop/directory Disallow: /coop/manage Disallow: /trends? Disallow: /trends/music? Disallow: /trends/hottrends? Disallow: /trends/viz? Disallow: /trends/embed.js? Disallow: /trends/fetchComponent? Disallow: /trends/beta Disallow: /trends/topics Disallow: /musica Disallow: /musicad Disallow: /musicas Disallow: /musicl Disallow: /musics Disallow: /musicsearch Disallow: /musicsp Disallow: /musiclp Disallow: /urchin_test/ Disallow: /movies? Disallow: /wapsearch? Allow: /safebrowsing/diagnostic Allow: /safebrowsing/report_badware/ Allow: /safebrowsing/report_error/ Allow: /safebrowsing/report_phish/ Disallow: /reviews/search? Disallow: /orkut/albums Disallow: /cbk Allow: /cbk?output=tile&cb_client=maps_sv Disallow: /maps/api/js/AuthenticationService.Authenticate Disallow: /maps/api/js/QuotaService.RecordEvent Disallow: /recharge/dashboard/car Disallow: /recharge/dashboard/static/ Disallow: /profiles/me Allow: /profiles Disallow: /s2/profiles/me Allow: /s2/profiles Allow: /s2/oz Allow: /s2/photos Allow: /s2/search/social Allow: /s2/static Disallow: /s2 Disallow: /transconsole/portal/ Disallow: /gcc/ Disallow: /aclk Disallow: /cse? Disallow: /cse/home Disallow: /cse/panel Disallow: /cse/manage Disallow: /tbproxy/ Disallow: /imesync/ Disallow: /shenghuo/search? Disallow: /support/forum/search? Disallow: /reviews/polls/ Disallow: /hosted/images/ Disallow: /ppob/? Disallow: /ppob? Disallow: /accounts/ClientLogin Disallow: /accounts/ClientAuth Disallow: /accounts/o8 Allow: /accounts/o8/id Disallow: /topicsearch?q= Disallow: /xfx7/ Disallow: /squared/api Disallow: /squared/search Disallow: /squared/table Disallow: /qnasearch? Disallow: /app/updates Disallow: /sidewiki/entry/ Disallow: /quality_form? Disallow: /labs/popgadget/search Disallow: /buzz/post Disallow: /compressiontest/ Disallow: /analytics/feeds/ Disallow: /analytics/partners/comments/ Disallow: /analytics/portal/ Disallow: /analytics/uploads/ Allow: /alerts/manage Allow: /alerts/remove Disallow: /alerts/ Allow: /alerts/$ Disallow: /ads/search? Disallow: /ads/plan/action_plan? Disallow: /ads/plan/api/ Disallow: /ads/hotels/partners Disallow: /phone/compare/? Disallow: /travel/clk Disallow: /hotelfinder/rpc Disallow: /hotels/rpc Disallow: /flights/rpc Disallow: /async/flights/ Disallow: /commercesearch/services/ Disallow: /evaluation/ Disallow: /chrome/browser/mobile/tour Disallow: /compare/*/apply* Disallow: /forms/perks/ Disallow: /shopping/suppliers/search Disallow: /ct/ Disallow: /edu/cs4hs/ Disallow: /trustedstores/s/ Disallow: /trustedstores/tm2 Disallow: /trustedstores/verify Disallow: /adwords/proposal Disallow: /shopping/product/ Disallow: /shopping/seller Disallow: /shopping/reviewer Disallow: /about/careers/applications/ Disallow: /landing/signout.html Disallow: /webmasters/sitemaps/ping? Disallow: /ping? Disallow: /gallery/ Disallow: /landing/now/ontap/ Allow: /searchhistory/ Allow: /maps/reserve Allow: /maps/reserve/partners Disallow: /maps/reserve/api/ Disallow: /maps/reserve/search Disallow: /maps/reserve/bookings Disallow: /maps/reserve/settings Disallow: /maps/reserve/manage Disallow: /maps/reserve/payment Disallow: /maps/reserve/receipt Disallow: /maps/reserve/sellersignup Disallow: /maps/reserve/payments Disallow: /maps/reserve/feedback Disallow: /maps/reserve/terms Disallow: /maps/reserve/m/ Disallow: /maps/reserve/b/ Disallow: /maps/reserve/partner-dashboard Disallow: /about/views/ Disallow: /intl/*/about/views/ Disallow: /local/dining/ Disallow: /local/place/products/ Disallow: /local/place/reviews/ Disallow: /local/place/rap/ Disallow: /local/tab/ Disallow: /travel/hotels/ Allow: /finance Allow: /js/ Disallow: /finance?*q=* # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. User-agent: Twitterbot Allow: /imgres User-agent: facebookexternalhit Allow: /imgres Sitemap: http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml Sitemap: https://www.google.com/sitemap.xml |
약속을 지키지 못하면 ?
robots.txt파일에 있는 정보를 통해 크롤링을 할 수있는 크롤러 또는 봇과 특정 URL을 크롤링을 해도될지 안해도 될지 알 수 있습니다.
하지만 위에서 말했다 시피 약속한 내용을 지킬 필요는 없지만 disallow한 URL에 대해서 크롤링한 정보를 다른 용도로 사용하다가 법적으로 처벌을 받을 수 있으니 조심해서 크롤링을 해야합니다.
'IT이것저것' 카테고리의 다른 글
JAVA 또는 Tomcat JVM Heap 상태 확인하기 (0) | 2018.11.27 |
---|---|
REST API란 무엇일까? (0) | 2018.09.20 |
크롬 개발자도구 Network 사용법 (0) | 2018.06.28 |
[Eclipse] Break Point 제거하기 (0) | 2018.05.15 |
Target runtime Apache Tomcat is not defined 해결 방법 (6) | 2017.09.20 |