in early November 2012, according to the friction between the parties exacerbated the situation in the lead Chinese Internet association, including 12 Internet Co, Qihoo 360, Sina love Shanghai, has signed the "Internet search engine service self-discipline", "seventh commitments in the Convention to follow international trade practices and abide by the rules of business the robot protocol (robots protocol).
students are aware of web pages through hyperlinks connected together, thus forming a network structure of the web page. The crawler works just like a spider in the Internet along the link to climb, the most basic process can be simplified as follows:
understand the above process can be seen on the crawler website is very passive, only honest and grabbed a. There is such a demand:
1. to feed a bunch of URL crawler, we called seed (seeds)
3. crawler grab these newly discovered and then link to "
cyclesWhat is the In fact,
To understand the
February 8, 2013 Beijing first intermediate people’s court has formally accepted the love of unfair competition behavior in violation of the sea Qihoo 360 "Robots Protocol" capture, copy the contents of its website as a case, the claim amount up to one hundred million yuan, which can be seen as the second half of 2012 to "3B war". In this case, love is not allowed into the sea called the 360 crawler has set Robots of the text, while 360 of the reptiles are still on the "Shanghai know love" and "love Shanghai encyclopedia" love Shanghai website content to crawl.
path is a personal privacy.
robots protocol must first understand robots, this said, robots is not "I, robot" in ·’s; Smith, not "Wall-E" in Wall-E and Eva, not the "terminator" series in Schwarzenegger. What? You don’t know these classic movies? Well, I want more. The robots refers to the field of search engine web robots, many people may not be familiar with this name, but referred to Web Wanderers and Spiders Crawlers, a lot of people may see light suddenly, in Chinese we referred to as crawler or crawler, also is the search engine grab web program.
2. seeds HTML web crawler, analysis, from which links
What is robots
today to find an opportunity to talk about the overnight robots protocol.