Advanced Topics in Multi-Core Architecture and Software Systems
Spring 2017

Lecturer: Adam Morrison
When: Wed, 16:00-19:00
Room: Schreiber 008


Writing fast and scalable multi-core programs is hard. Multi-core programming models fail to abstract away the details of the underlying computer architecture, partly because of limitations in the hardware/software interfaces (such as instruction sets and memory models). We must thus understand the multi-core architecture, both to design efficient algorithms and systems, and to find better abstractions and interfaces.

This course covers the state of the art in multi-core programming and architecture. A main objective is to introduce students to open research problems in multi-core research.

NOTE: We will consider hardware only at the microarchitecture level, not at the logic/gate level; for an example of microarchitectural material, see the book A Primer on Memory Consistency and Cache Coherence.


Requirements and grading


Date Lecture Papers
Mar 15

Introduction (slides on moodle)
Administrative details
Taste of topics

The Implementation of the Cilk-5 Multithreaded Language

Idempotent Work Stealing

Laws of Order: Synchronization in Concurrent Algorithms

Fence-Free Work Stealing on Bounded TSO Processors

Mar 22

Background material
Reasoning about concurrency (linearizability)
Architecture basics (cache coherence, etc.)

Linearizability: A Correctness Condition for Concurrent Objects

A Primer on Memory Consistency and Cache Coherence (Chapters 2, 6-8)

Mar 29

Serializing efficiently
Spin locks and local spinning
Lock-free synchronization and CAS failures

Algorithms for scalable synchronization on shared-memory multiprocessors

Flat Combining and the Synchronization-Parallelism Tradeof

OpLog: a library for scaling update-heavy data structures

Fast concurrent queues for x86 processors

Apr 19

Optimistic concurrency control

A Lazy Concurrent List-Based Set Algorithm

A Practical Concurrent Binary Search Tree

Transactional Memory: Architectural Support for Lock-Free Data Structures

Apr 26

Safe memory reclamation

High Performance Dynamic Lock-Free Hash Tables and List-Based Sets

Structured Deferral: Synchronization via Procrastination (explains RCU and compares to hazard pointers)

Practical lock-freedom (Epoch-based reclamation, Section 5.2.3)

Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects

Further reading:

Fast non-intrusive memory reclamation for highly-concurrent data structures

Drop the Anchor: Lightweight Memory Management for Non-Blocking Data Structures

Efficient Memory Management for Lock-Free Data Structures with Optimistic Access

StackTrack: An Automated Transactional Approach to Concurrent Memory Reclamation

Reclaiming Memory for Lock-Free Data Structures: There has to be a Better Way

May 3

Memory consistency models (hardware)

A Primer on Memory Consistency and Cache Coherence (Chapters 3-5)

WeeFence: Toward Making Fences Free in TSO

May 10

Memory consistency models (programming language)

Threads Cannot be Implemented as a Library

The Java Memory Model

Further reading:

Foundations of the C++ Concurrency Memory Model

May 17

Cache coherence protocols

DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism

DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations

Further reading (comparing complexity of protocols):

Revisiting the Complexity of Hardware Cache Coherence and Some Implications

May 24

Concurrency control in databases

Speedy Transactions in Multicore In-Memory Databases

TicToc: Time Traveling Optimistic Concurrency Control

Jun 7

Advanced issues in hardware transactional memory

LogTM: Log-based Transactional Memory

OmniOrder: Directory-Based Conflict Serialization of Transactions

Jun 14

Ordered parallelism and relaxed data structures (1/2)

Skip lists (Section 4.3.3 of the thesis)

The SprayList: A Scalable Relaxed Priority Queue

MultiQueues: Simpler, Faster, and Better Relaxed Concurrent Priority Queues

Further reading:

The Power of Choice in Priority Scheduling

Jun 21

Ordered parallelism and relaxed data structures (2/2)

A Lightweight Infrastructure for Graph Analytics (Section 4.1)

A Scalable Architecture for Ordered Parallelism

Jun 28

Non-volatile memory and its relation to concurrency


The main requirement is for the project to make a new (even if small) contribution to our knowledge. You can try attacking a research problem shown in class, do something based on your own research, or implement and try to reproduce the results of prior papers.

The project may be done individually or in a group of 2 to 3.

Most projects will require programming, e.g., for experiments and evaluation. Programming language depends on the project, but projects reproducing prior work will likely be in C/C++.

Proposal: Submit a writeup (PDF, max 2 pages, in English) describing the problem, why it is interesting, the state of the art, and what you plan to do—your idea for attacking the problem. Each group needs to submit only one copy (include the names and IDs of all members). We may need to iterate on the proposal; don't start working on the project before the proposal is approved.

Final report: Submit a writeup (PDF, English) describing the problem and what you did. Submit your code as well (preferably as a link to a repository on Github or Bitbucket). Unless otherwise agreed, your code must be runnable on the school's Linux machines. Each group needs to submit only one copy.