There are many busi­ness, math­em­at­ical, com­puter sci­ence prob­lems that can be solved with the gen­eral meth­ods of dis­crete math­em­at­ics (dis­cretely struc­tured math­em­at­ical sys­tems). For example, find­ing the largest num­ber among a set of integers or put­ting them all in increas­ing order or find­ing the shortest part between two ver­tices of a graph model. To com­plete these types of tasks, a cer­tain method is needed that will reach the solu­tion in a reas­on­able time. Ideally, a method with a pre­cise descrip­tion of steps that lead to an answer. We call such a pro­ced­ure an algorithm. Here is a more con­crete definition.

Defin­i­tion: An algorithm is a (finite) sequence of pre­cise instruc­tions (steps) to per­form a com­pu­ta­tion (in order to solve a problem).

His­tor­ical Note: The name algorithm comes from the name of the 9th cen­tury math­em­atician “al-Khowar­izmi”. The word “algor­ism” was being used ini­tially for per­form­ing arith­metic oper­a­tions with decimal num­bers, and by 19th cen­tury this word evolved into the word “algorithm”. Today, an Eng­lish female math­em­atician “Ada Lovelace” (1815 – 1857) is con­sidered to be the first com­puter pro­gram­mer. Although there are descrip­tions of numer­ical algorithms from ancient Greece such as (Euc­lidean Algorithm), she has writ­ten the first algorithm in a mod­ern com­puter sci­ence sense.

When algorithms are described, cer­tain prop­er­ties must be kept in mind:

Prop­er­ties of Algorithms:
  • INPUT: A set of val­ues (strings, integers, webpages etc.) from a spe­cific source.
  • OUTPUT: This is the solu­tion to the prob­lem. The algorithm pro­duces out­put val­ues from the input values.
  • DEFINITENESS: Each step of the algorithm must be pre­cisely described.
  • CORRECTNESS: The algorithm must pro­duce the cor­rect and desired outputs.
  • FINITENESS: The num­ber of steps of an algorithm must be finite.
  • EFFECTIVENESS: It must be pos­sible to per­form each step of the algorithm in a pre­cise method within a reas­on­able amount of time.
  • GENERALITY: The whole pro­ced­ure should be applic­able for all the prob­lems of the same type. If we have a pro­ced­ure spe­cific­ally designed just for a par­tic­u­lar set of inputs, we can’t claim the pro­ced­ure as an algorithm.

Example 1: Let’s con­sider a typ­ical num­ber prob­lem: Find­ing the greatest num­ber among given set of integers. Although this prob­lem is rel­at­ively obvi­ous, it will provide an illus­tra­tion for the concept of an algorithm. (Note that, one could encounter this sort of prob­lem every day, for example find­ing the best rated res­taur­ant in the neigh­bour­hood.) We would like to develop a cer­tain method to achieve the task.

A Solu­tion: Note that each step of the pro­ced­ure must be pre­cise enough so that a com­puter could “under­stand” and “fol­low”. Let’s per­form the fol­low­ing steps:

  1. We start with the concept of tem­por­ary max­imum, and we set it to be the first num­ber in the sequence.
  2. We com­pare the next num­ber in the sequence with the tem­por­ary max­imum, if that num­ber is greater than the tem­por­ary max­imum then we update the tem­por­ary max­imum to be that number.
  3. Repeat the pre­vi­ous pro­ced­ure as long as there are num­bers left in the sequence.
  4. Stop when there is no num­ber left in the sequence, at this point the tem­por­ary max­imum is the largest num­ber in the sequence.

Here we use the Eng­lish lan­guage to describe the pro­ced­ure. The actual algorithm that can be inter­preted by a com­puter with this logic depends on the pro­gram­ming language.

Whether this is an excel­lent solu­tion or not is another ques­tion. That could be con­sidered as the effi­ciency of the algorithm. There are cer­tain math­em­at­ical cal­cu­la­tions to meas­ure the effi­ciency, but we will men­tion those later in a dif­fer­ent post.

PageR­ank Algorithm

Here is an example of a more com­plex algorithm, so called PageR­ank© algorithm of Google. This algorithm was first intro­duced by Larry Page (co-founder of Google) in 1998. The name “PageR­ank” is a trade­mark of Google, the pro­cess has been pat­en­ted which actu­ally belonged to Stan­ford Uni­ver­sity. The uni­ver­sity received 1.8 mil­lion Google shares for the use of the pat­ent (all shares have been sold for 336 mil­lion US Dol­lars in 2005). We note that, this is just one of the factors determ­in­ing Google Search res­ults but still con­tin­ues to provide a basis for Google’s web search.

The algorithm is used as a method of meas­ur­ing how “import­ant” a web­site is. It is based on the assump­tion that more import­ant web­sites would be more likely to be linked by other web­sites. It basic­ally meas­ures the num­ber and the qual­ity of links to a cer­tain page to determ­ine an estim­ate how import­ant a web­site is. It assigns a weighted numer­ical value to each web­site to determ­ine the import­ance of the page and nat­ur­ally more import­ant web­sites appears earlier in the search results.

In the fol­low­ing sim­pli­fied dia­gram, there are five web­sites, each arrow rep­res­ents a link to the other page. The web­site in the middle, named “webpage”, is con­sidered to be a “more pop­u­lar” web­site, con­sid­er­ing the fact that other pages are dir­ect­ing the user to that page.

A representation of page 1-4 that have a correlation with webpage in the centre

We note that not all pages have the same value, for instance if a webpage is linked by Wiki­pe­dia, that would be more valu­able than a link from a newly estab­lished website.

We could explain the sim­pli­fied algorithm recurs­ively in the fol­low­ing way:

  • The value of PageR­ank is ini­tial­ized the same for all webpages:
The formula Rank (0, url) = 1

where 0 (zero) is the first step and “url” is the address of the website.

  • Then all the links to this “url” are con­sidered. We iter­ate the pro­ced­ure with the fol­low­ing formula:
Rank formula

Here, we basic­ally iter­ate the pro­ced­ure to assign a numer­ical “import­ance” value to a cer­tain web­site “url”, con­sid­er­ing the num­ber of links dir­ect­ing to this url and also, we need to keep track of the total num­ber of links on a web­site (so that, the more links on a webpage there are, the less its weighted value will be).

Again, this is a sim­pli­fied ver­sion of the actual algorithm, there are other nor­mal­iz­ing factors in con­sid­er­a­tion. More detailed inform­a­tion can be found in ref­er­ences [1], [2], [4].

In words, we could roughly explain the procedure:

  • A web surfer picks a web­site ran­domly, there might be sev­eral links on that page or maybe none at all (in that case pick another ran­dom webpage). Then the web surfer selects a ran­dom link on that page (if exists) and goes to the link.
  • This pro­ced­ure is applied over and over again, so the num­ber of times we reach a cer­tain page gives us an idea about how pop­u­lar the page is (which is the num­ber assigned by the for­mula above)
  • Repeat the pro­ced­ure peri­od­ic­ally to update the import­ance of a website.

With this algorithm a source page (such as Wiki­pe­dia) hav­ing many cita­tions (links dir­ect­ing to Wiki­pe­dia) would be one of the most import­ant pages dir­ectly appear­ing in search results.

It looks a fun­da­mental approach to solve the prob­lem of set­ting a search engine. Surely there might be many other types of algorithms that could be used to surf the entire web. So, if you have a sim­ilar idea (per­haps even faster or more effi­cient) of an algorithm (plus a few bil­lions of dol­lars) you might be some day a com­pet­itor of Google!

Ref­er­ences:

[1] (Art­icle) The Ana­tomy of a Large-Scale Hyper­tex­tual Web Search Engine by Sergey Brin and Lawrence Page. http://infolab.stanford.edu/pub/papers/google.pdf

[2] (Art­icle) The PageR­ank Cita­tion Rank­ing: Bring­ing Order to the Web http://ilpubs.stanford.edu:8090/422/1/1999–66.pdf

[3] (Book) Dis­crete Math­em­at­ics and Its Applic­a­tions by Ken­neth H. Rosen https://faculty.ksu.edu.sa/sites/default/files/rosen_discrete_mathematics_and_its_applications_7th_edition.pdf

[4] (Webpage) PageR­ank https://en.wikipedia.org/wiki/PageRank