Set­ting the stage

The fol­low­ing piece of imper­at­ive-style pseudo code (taken from [1]) is one way¹ to imple­ment John Conway’s cel­eb­rated “Game of Life”. Before inspect­ing the code, you may want to quickly find out what the “Game of Life” (GoL) actu­ally is, e.g. here: https://academo.org/demos/conways-game-of-life/.

def life_evaluation( cells ):  # cells is a 3x3 array
 count = 0
 for i in [0,1,2]:
   for j in [0,1,2]:
     if i!=1 and j!=1:
       count += cells[i,j]
 return life_count_evaluation( cells[1,1],count )
def life_count_evaluation( cell,count ): if count<2: return 0 # loneliness elif count>3: return 0 # overcrowding elif cell==1 and (count==2 or count==3): return 1 # live on elif cell==0 and count==3: return 1 # spontaneous generation else: return 0 # no change in dead cells
def life_generation( board,tmp ): for i in [0:N-1]: for j in [0:N-1]: tmp[i,j] = board[i,j] for i in [0:N-1]: for j in [0:N-1]: board[i,j] = life_evaluation( tmp[i-1:i+1,j-1:j+1] )

# create an initial board; we omit this code life_board.create(N,N) temp_board.create(N,N)
# now iterate a number of steps on the whole board for t in [0:final_time-1]: life_generation( life_board,temp_board )

Hav­ing absorbed the pre­ced­ing bar­rage of inform­a­tion, you may have come to sense that “Life is just one damn thing after another”². Well, that impres­sion would be right. Of course, only inso­far as we are talk­ing about the imple­ment­a­tion (which takes noth­ing away from the “allur­ing, trans­fix­ing, infec­tious com­plex­ity” which can arise from the Game of Life’s eleg­antly min­im­al­istic laws [2]). In [1] it is referred to as the basic “sequen­tial imple­ment­a­tion, since it does its com­pu­ta­tion in a long sequence of steps.” (See Fig­ure 1 for an example of three time-steps in the Game of Life.) Each time-step is com­puted one after the other (as you can tell from the final “for loop”), as is also each cell within each time-step.

Fig­ure 1: An example of a sequence of three time-steps in the Game of Life. Source: https://academo.org/demos/conways-game-of-life/

But this is not a logical require­ment of the prob­lem. Each cell’s state at t+1 can be com­puted in par­al­lel (as a cell’s state at t+1 does not depend on any other cell’s state at t+1, only on its neigh­bours’ state at t). Even com­put­ing the time-steps in strict sequence is not a neces­sity. As an example, a patch of 5x5 cells at time t would be suf­fi­cient to com­pute the cell i,j at t+2 (you might be able to intuit that the state of a cell can be com­puted for two time-steps by con­sid­er­ing a range of neigh­bour­ing cells at most two cells apart in the or j dir­ec­tion). So as far as this cell is con­cerned, there is no need to wait for t+1 in some other part of the board to be com­puted first.

Oppor­tun­it­ies there­fore abound for escap­ing the tedium of com­put­ing GoL sequen­tially. But what are the means to elim­in­ate sequen­tial exe­cu­tion, in par­tic­u­lar where it is “an arte­fact of the pro­gram­ming lan­guage”, as opposed to a logical require­ment of the prob­lem [1]?

In [1], the reader is guided through some avail­able tech­niques and includes a num­ber of segues into rel­ev­ant hard­ware con­sid­er­a­tions. For instance, in dis­cuss­ing the idea that mul­tiple cells may be updated in par­al­lel via so-called “vec­tor instruc­tions”, as these are well “matched to the inner­most com­pon­ents of a pro­cessor core “ (at a stretch, we may ima­gine this facil­ity being applied to the com­pu­ta­tion of the “count” vari­able – which tracks the num­ber of neigh­bour­ing cells which are alive – in the inner loop of “life_evaluation”).

Fig­ure 2: Instead of oper­at­ing on a single pair of inputs, you would load two or more pairs of oper­ands, and execute mul­tiple identical oper­a­tions sim­ul­tan­eously. Applied to our case, instead of the vari­able “count” being asso­ci­ated with a single cell, we treat it as an array, whereby the ele­ments of this array are in rela­tion to an array of cells on the board. The idea is to update mul­tiple ele­ments of this array sim­ul­tan­eously. Source: [1]

Fig­ure 2 illus­trates the idea of vec­tor instruc­tions. If, how­ever, this still leaves you feel­ing unsure as to how exactly you would sum­mon this power of par­al­lel­ism (hint: [1] men­tions the “loop unrolling” tech­nique), rest assured that – in the imper­at­ive pro­gram­ming paradigm – adding “par­al­lel­ism is not a magic ingredi­ent that you can eas­ily add to exist­ing code, it needs to be [an] integ­ral part of the design of your pro­gram” [1]. Like the char­ac­ter Glendower in Shakespeare’s King Henry IV, you can call the spir­its from the vasty deep. But hav­ing to rely (to some extent) on the compiler’s abil­ity to fig­ure out when vec­tor instruc­tions can be used [1] is remin­is­cent of Hotspur’s taunt­ing counter: 

Why, so can I, or so can any man;

But will they come when you do call for them?


Inter­lude

So far I have alluded to a form of so-called data par­al­lel­ism. The same instruc­tion being per­formed on mul­tiple data ele­ments (i.e., a bunch of cells within a given row or column) at the same time. But what if, in addi­tion, we wanted dif­fer­ent regions of the board to be pro­cessed in par­al­lel? Before address­ing that, let us take a step back from Life.

If our grand goal is to elim­in­ate arti­fi­cial “sequen­tial­ism”, ideally it would be our aim to let data depend­en­cies “loc­ate par­al­lel­ism” [3]. This is, per­haps, an odd way of say­ing that we want par­al­lel exe­cu­tion to be the norm, except where data depend­en­cies require a logical order to the com­pu­ta­tion. Clearly, this is ideal­ist­ic­ally put, as it assumes that any num­ber of instruc­tions can be executed in par­al­lel – in real­ity, we are lim­ited to a finite num­ber of pro­cessing ele­ments (it also implies unboun­ded buf­fer­ing capa­cit­ies, which is unreal­istic) [3].


Entrance of the data­flow approach

Well, the spirit of this approach can be (par­tially) cap­tured by fol­low­ing our guide [1] in an effort to dis­trib­ute the pro­gram across mul­tiple pro­cessors (or nodes), and employ­ing a Mes­sage Passing Inter­face (MPI) lib­rary, which “adds func­tion­al­ity to an oth­er­wise nor­mal C or For­tran pro­gram for exchan­ging data with other pro­cessors”. We can make use of this approach to par­al­lel­ize the com­pu­ta­tion of dif­fer­ent regions of the board.

Fig­ure 3: Pro­cessor p receives a line of data from p − 1 and p + 1. Source: [1]

As alluded to by Fig­ure 3, we may split the board into a num­ber of con­cur­rent inde­pend­ent pro­cesses. Each can start chug­ging away, com­put­ing its line before any other line has been fin­ished pro­cessing. To com­plete the work on each iter­a­tion, how­ever, it does require that each pro­cess has received the cur­rent states of the adja­cent lines on the board (this rep­res­ents the data dependency).

The afore­men­tioned approach is termed data­flow in [1]. Indeed, although it is also defined there, gen­eral use of the term doesn’t identify a very pre­cise thing. The term can be a ref­er­ence to the pure com­pu­ta­tional model, to hard­ware archi­tec­tures, to one of a num­ber of dif­fer­ent data­flow lan­guage imple­ment­a­tions, or, as in the pre­ced­ing example, to a kind of tech­nique that abstracts away from the under­ly­ing lan­guage imple­ment­a­tion. In what fol­lows, and pos­sibly in a way that gives hand-wav­ing a good name, I will try to pick out a few of the core prop­er­ties, which (I would guess) are shared by most designs that lay claim to the data­flow label. The MPI example from above may rep­res­ent some­what of a spe­cial case, which does not live up to the billing entirely (although I lack the ambi­tion to state pre­cisely in what ways).

First is the “impli­cit achieve­ment of con­cur­rency [4].” That is, data­flow reduces the need for the pro­gram­mer to expli­citly “handle con­cur­rency issues such as sem­a­phores or manu­ally spawn­ing and man­aging threads [4].” This is made pos­sible by the fact that data­flow con­sists of a num­ber of inde­pend­ently act­ing “pro­cessing blocks”, which are each driven by the pres­ence of input data (as opposed to an instruc­tion, which, while input data may be avail­able, must wait its turn in the pro­gram until the pro­gram counter reaches it). As a res­ult, we are lib­er­ated from the sequen­tial order­ing imposed by a clas­sical language’s exe­cu­tion model.

Take Fig­ure 4 as an example. On the left we have a simple imper­at­ive pro­gram, on the right its data­flow equi­val­ent. Whereas the assign­ment oper­a­tion for B must wait in the imper­at­ive pro­gram until the first oper­a­tion has com­pleted, the first two oper­a­tions may pro­ceed con­cur­rently in the data­flow pro­gram. We might call this kind of par­al­lel­ism “task” par­al­lel­ism, which would be the ana­logue of the MPI example dis­cussed above. Note also, that while the oper­a­tion for C is in pro­gress, a second wave of com­pu­ta­tions may already be execut­ing for the first two oper­a­tions. This is known as pipeline par­al­lel­ism and comes about nat­ur­ally in the data­flow exe­cu­tion model. The equi­val­ent of data par­al­lel­ism, which was men­tioned above, would be mul­tiple instances of the same graph (or sub­graph), with the input data streams par­ti­tioned such that each data item is fed into one or another of the run­ning instances.

Fig­ure 4: A simple pro­gram (a) and its data­flow equi­val­ent (b). Source: [3]

Going a bit deeper, the data­flow dia­gram in Fig­ure 4 can be under­stood as fol­lows [3]:

In the data­flow exe­cu­tion model, a pro­gram is rep­res­en­ted by a dir­ec­ted graph. The nodes of the graph [in the pure data­flow model] are prim­it­ive instruc­tions such as arith­metic or com­par­ison oper­a­tions. Dir­ec­ted arcs between the nodes rep­res­ent the data depend­en­cies between the instruc­tions. Con­cep­tu­ally, data flows as tokens along the arcs which behave like unboun­ded first-in, first-out (FIFO) queues.”

Most expos­i­tions of the data­flow model point out this uni­fy­ing con­nec­tion between a data­flow pro­gram and a dir­ec­ted graph. Rarely, though, is an under­ly­ing explan­a­tion of this con­nec­tion made expli­cit – per­haps, because it is too obvi­ous. How­ever, this visual prop­erty paves the way to visual pro­gram­ming, a very com­pel­ling advant­age of the data­flow paradigm. It there­fore seems a worthy prop­erty to ponder .

As the name sug­gests, a dir­ec­ted graph brings with it a sense of dir­ec­tion­al­ity. In Fig­ure 4, the flow of data is visu­ally guided by arrows. This rep­res­ent­a­tion is com­pat­ible with the data­flow model, since, in effect, the model imposes a unique rela­tion between a data­flow vari­able and an arc. More tech­nic­ally, within each com­pu­ta­tional “wave”, data­flow vari­ables are single-assign­ment vari­ables (i.e., non-mut­able). In a tra­di­tional, imper­at­ive-style pro­gram­ming lan­guage, this sense of dir­ec­tion­al­ity is not so evid­ent. Here, vari­ables may be thought of as con­tain­ers in which a value may be stored and updated mul­tiple times [5]. Take a state­ment, such as the fol­low­ing, which implies repeated vari­able assignment:

  n = n + 1

Try­ing to model such a state­ment as a simple graph would imply that n is both an input arc to the addi­tion oper­a­tion (along with the input arc feed­ing the con­stant value of 1), as well as its out­put arc. This throws into chaos any attempt to build a graph rep­res­ent­a­tion. Hence, the notion of single-assign­ment vari­ables looks to be rather import­ant, and indeed, is seen as one of the dis­tin­guish­ing prop­er­ties of the data­flow model. See in [5] for example, how this and other con­cepts are used to con­struct a tax­onomy of pro­gram­ming paradigms!

A related prop­erty which makes the data­flow model amen­able to the graph view is that it enforces pure data inter­faces between the nodes. In other words, the graph’s set of arcs rep­res­ent the entirety of the inform­a­tion trans­fer that hap­pens between nodes – no other forms of inform­a­tional “coup­ling” between mod­ules are allowed (i.e. any form of con­trol com­mu­nic­a­tion [6], such as data which might be inter­preted as a flag used to con­trol cer­tain aspects of the exe­cu­tion of another mod­ule). The level of coup­ling in data­flow is in fact the min­imal require­ment for a func­tion­ing sys­tem [6]. In other words, “coup­ling is…minimized when only input data and out­put data flow across the inter­face between two mod­ules” [6]. This approach to pro­gram­ming may seem restrict­ive, but in light of the reduc­tion of com­plex­ity, it is in fact sug­ges­ted as a gen­eral pro­gram­ming meth­od­o­logy by the authors of [6]³. “The great advant­age of the data­flow approach is that “mod­ules” inter­act in a very simple way, through the pipelines con­nect­ing them” [7].

Lastly, it is also instruct­ive to notice what the graph view is devoid of, namely any expres­sion con­trolling the dynamic beha­viour of the pro­gram. In Fig­ure 4, the order of the state­ments on the left is an expres­sion of the order of exe­cu­tion. That is, the mere order­ing of state­ments is express­ing a flow of con­trol. More gen­er­ally, one can ima­gine the exe­cu­tion of a tra­di­tional pro­gram as jump­ing around the code of the pro­gram, whereby one single instruc­tion is executed at a time.

By con­trast, the graph on the right in Fig­ure 4 does not con­vey whether the addi­tion or the divi­sion oper­a­tion should pro­ceed first. It would there­fore be inad­equate to rep­res­ent the oper­a­tional semantics of an imper­at­ive pro­gram. Not so for the data­flow model. The pro­gram­mer does not have to spe­cify the dynamic exe­cu­tion beha­viour, which is pre­cisely one of dataflow’s advant­ages. Provided the input arcs have data on them (at which point the node/instruction is said to be “fire­able”), the first two oper­a­tions in Fig­ure 4 may pro­ceed con­cur­rently. More gen­er­ally, “if sev­eral instruc­tions become fire­able at the same time, they can be executed in par­al­lel. This simple prin­ciple provides the poten­tial for massive par­al­lel exe­cu­tion [3]“. As more of an aside, this beha­viour of wait­ing for avail­able input data is expressed as, tech­nic­ally, wait­ing for a vari­able to be bound. The fact that [8] iden­ti­fies data­flow with this very prop­erty – “this civ­il­ized beha­viour is known as data­flow” – sug­gests its significance.

So that con­cludes some basic insights into data­flow. Clearly, much more can be said about this topic which has seen sev­eral dec­ades of devel­op­ment. Indeed, the first clear state­ment of pipeline data­flow dates back as far as 1963 [7]  – incid­ent­ally, made by a cer­tain Con­way (Melvin Con­way in this case). Any­way, I hope that this blog has left you with an appet­ite for more.

Ref­er­ences

[1]V. Eijk­hout, “Chapter 10 – Par­al­lel pro­gram­ming illus­trated through Conway’s Game of Life,” in Top­ics in Par­al­lel and Dis­trib­uted Com­put­ing, Boston, Mor­gan Kaufmann, 2015, pp. 299–321.
[2]S. Roberts, Genius at Play: The Curi­ous Mind of John Hor­ton Con­way, Lon­don: Blooms­bury, 2015.
[3]Wes­ley, A. John­ston .J. R. Paul Hanna, and Richard J. Miller, “Advances in Data­flow Pro­gram­ming Lan­guages,” ACM Com­put­ing Sur­veys, vol. 36, pp. 1–34, 2004.
[4]T. B. Sousa, “Data­flow Pro­gram­ming Concept, Lan­guages, and Applic­a­tions,” Doc­toral Sym­posium on Inform­at­ics Engin­eer­ing, vol. 130, 2012.
[5]P. Van Roy, Pro­gram­ming Paradigms for Dum­mies: What Every Pro­gram­mer Should Know, 2012.
[6]Your­don, Edward and Con­stantine, Larry L.; Struc­tured Design: Fun­da­ment­als of a Dis­cip­line of Com­puter Pro­gram and Sys­tems Design, USA: Pren­tice-Hall, Inc., 1979.
[7] Wadge, Wil­liam W. and Ash­croft, Edward A.; LUCID, the Data­flow Pro­gram­ming Lan­guage, USA: Aca­demic Press Pro­fes­sional, Inc., 1985.
[8]P. van Roy, H. Seif; Con­cepts, Tech­niques, and Mod­els of Com­puter Pro­gram­ming, MIT Press, 2003.

¹ Assum­ing we have a board of N×N cells and we would like to com­pute a cer­tain num­ber of timesteps. We also don’t care much for the edge of the board.

² This quote has been attrib­uted to vari­ous people, among them Mark Twain.

³ Their method involves two sep­ar­ate stages; in the first (“pro­gram ana­lysis”) the analyst pro­duces a (pipeline) data­flow net that per­forms the neces­sary com­pu­ta­tions; then in the second stage (“pro­gram design”) the net is trans­lated into the lan­guage at hand, usu­ally COBOL [6].