## The Rise of the Virtual Cellular Machines

My apologies if you were looking for a post about cell phones. This post is about high density nanodevices. It's a follow up to How will memristors change everything? for those wishing to pursue these revolutionary ideas in more depth. This is one of those areas where if you are in the space then there's a lot of available information and if you are on the outside then it doesn't even seem to exist. Fortunately, Ben Chandler from The SyNAPSE Project, was kind enough to point me to a great set of presentations given at the 12th IEEE CNNA - International Workshop on Cellular Nanoscale Networks and their Applications - Towards Megaprocessor Computing. WARNING: these papers contain extreme technical content. If you are like me and you aren't an electrical engineer, much of it may make a sort of surface sense, but the deep and twisty details will fly over head. For the more software minded there are a couple more accessible presentations:

- Intelligent Machines built with Memristive Nanodevices by Greg Snider of Hewlett-Packard Laboratories.
- Virtual and Physical Cellular Architectures for Kilo-processor Chip Computers by Prof. Tamas Roska of SZTAKI & Pazmany University, Budapest.

Here a few excerpts from the presentations, just things I found particularly interesting. I'm still trying to make sense of it all and I thought you might be interested too. It's clear there's something new here and it will require different algorithms and programming models to work. What will those be and who will invent them?

From Greg Snider's talk we see the programming model in a little more detail than was exposed before:

I get the vibe that the key notion is applying functions to arrays in parallel. Maybe APL will make a comeback? The use of actor model here is interesting as it's usually a threaded model where messages queue up and operate on state sequentially. This seems to have more of the flavor of a discrete even simulation with a clock stepping through state machines that operate on cells in parallel.

An overall picture of their architecture looks like:

Tamas Roska in his paper puts the field on a more general footing, one that as a programmer looks a little more familiar to me. He boldly declares *A new era in computing starts:*

- NOT just parallel
- NOT using the present algorithms
- NOT massaging the nanoscale devices to repeat the functional primitives of CMOS building blocks
- Running time is NOT the only measure
- There are NO efficient tools
- Key: understanding spatial-temporal algorithmic dynamics Cellular Wave Computers: one way of thinking
- Cellular Wave Computers: one way of thinking

## The Technology Trend

- ~ 30 nm : over 10 billion transistors
- Severe limits for clock speed and power dissipation
- 25 k mixed mode processors and sensors on a cellular visual microprocessor at 180 nm
- 45 nm: 70 k logic- and 2k arithmetic- processors about 1 million 8 bit microprocessors could be placed (~5 Billion transistors) ...Why not?
- Wire delay is bigger than gate delay, hence communication speed is limited (synchrony radius)
- Hence: The precedence of Locality ~ i.e. Cellular, mainly locally connected architecture is a must for high computing power

## Virtual and Physical Cellular Machines

- (i) to hide the physical details of the huge number of processing elements (cores, cells, threads, etc.),
- (ii) to provide the framework for designing the algorithms with elementary array instructions, and
- (iii) to serve as the starting phase for the Virtual to Physical Cellular Machine mapping.

## Virtual Cellular Machines

- Its computational building blocks are mainly arrays, composed of operator and memory elements, acting in space and time
- Heterogeneous and homogeneous computational blocks
- Building blocks have no physical constraints - size, bandwidth, speed, power dissipation, except the two classes of memory access local and global)
- Is composed of five types of building blocks:
- (i) cellular processor arrays/layers, CP, with simple (L, or A, or S type) or complex (X) cells and their local memories, these are the protagonist building blocks,
- (ii) classical digital stored program computers, P (microprocessors),
- (iii) multimodal topographic (T) or non-topographic inputs and outputs, I/O (e.g. scalar, vector, or matrix signals),
- (iv) global memories of different data types M, organized in qualitatively different sizes and access times (e.g. cache memories), and
- (v) interconnection pathways B (busses).

- Tasks, the algorithms to be implemented, are defined on the Data/Memory representations of the Virtual Cellular Machine
- Various data representations for Topographic (e.g. picture, image, PDE, molecule dynamics) and Non-topographic (e.g. Markov processes, algebra, number theory) problems

## New Principles

- the role of the geometric address of a single processor in an array is introducing new algorithmic principles:
- the precedence of geometric locality (due to the physical and logical locality), i.e. the cellular character – granular, topographic, etc.
- cellular wave dynamics as an instruction
- streaming activity pattern in logic
- Do NOT parallelize existing single processor algorithms – invent new cellular algorithms
- The physical constraints on communication speed and power dissipation
- The basic constraints are: wire delay and power dissipation. A side constraint is the number of communication pins (contacts).

## New principles of Computational Complexity

- The algorithmic and physical complexity measures of these many core architectures will be eventually different compared to the single processor systems – abandoning the asymptotic framework (what is big? 1 million?).
- With Ω cores and M total memory the finite virtual algorithmic complexity Ca = Ca(Ω , M )
- The finite physical computational complexity of an algorithm is measured by a proper mix of speed, power, area, and accuracy.
- A new era in Computer Science, Computer engineering begins….

## Reader Comments (1)

These seem very applicable to Finite-Difference calculations such as grids used for fluid dynamics or electromagnetics. I think it has been done using FPGA devices. I'm not sure how it would apply to any web type problem.