Colombia characterize the Web is to discover the elements that are part of the global Internet tangle limited by the “. CO” is to identify the parts which are: web pages, sites or domains within the national context of cyberspace Colombian managed since its inception by the University of the Andes, all with the purpose of learning about what Colombia is facing Web Internet since its composition, the study focused from a technological point of view, descriptive and analytical, which was supported with tools that allow to acquire information from different points of the Internet, making the collected data allowing to characterize the elements studied, these tools are called crawlers or robots. The research process took as an important element to achieve the characterization of a sample Web Colombia web addresses, to which was called “seed”, which helps the crawler broadened the range of sites and domains in the studio, managing to have a important statistical basis which permitted conclusions and significant features of the Web study was developed as an additional product plotter domains, which is complementary to the crawler used and allowed to have a graphical view of the object being studied. This document is expected to complement the research undertaken as a starting point to delve much more about the Web structure of the country.
keywords:
Crawler, Web, Nutch, Domains, Segments, grids