Create xkcd-Style Narrative Charts

45

8

In one of the more iconic xkcd strips, Randall Munroe visualised the timelines of several films in narrative charts:

enter image description here (Click for larger version.)

Source: xkcd No. 657.

Given a specification of the timeline of a movie (or some other narrative), you are to generate such a chart. This is a popularity contest, so the answer with the most (net) votes will win.

Minimum Requirements

To tighten the spec a bit, here is the minimum set of features every answer must implement:

  • Take as input a list of character names, followed by a list of events. Each event is either a list of dying characters, or a list of groups of characters (signifying which characters are currently together). Here is one example for how the Jurassic Park narrative could be encoded:

    ["T-Rex", "Raptor", "Raptor", "Raptor", "Malcolm", "Grant", "Sattler", "Gennaro",
     "Hammond", "Kids", "Muldoon", "Arnold", "Nedry", "Dilophosaurus"]
    [
      [[0],[1,2,3],[4],[5,6],[7,8,10,11,12],[9],[13]],
      [[0],[1,2,3],[4,7,5,6,8,9,10,11,12],[13]],
      [[0],[1,2,3],[4,7,5,6,8,9,10],[11,12],[13]],
      [[0],[1,2,3],[4,7,5,6,9],[8,10,11,12],[13]],
      [[0,4,7],[1,2,3],[5,9],[6,8,10,11],[12],[13]],
      [7],
      [[5,9],[0],[4,6,10],[1,2,3],[8,11],[12,13]],
      [12],
      [[0, 5, 9], [1, 2, 3], [4, 6, 10, 8, 11], [13]], 
      [[0], [5, 9], [1, 2], [3, 11], [4, 6, 10, 8], [13]], 
      [11], 
      [[0], [5, 9], [1, 2, 10], [3, 6], [4, 8], [13]], 
      [10], 
      [[0], [1, 2, 9], [5, 6], [3], [4, 8], [13]], 
      [[0], [1], [9, 5, 6], [3], [4, 8], [2], [13]], 
      [[0, 1, 9, 5, 6, 3], [4, 8], [2], [13]], 
      [1, 3], 
      [[0], [9, 5, 6, 3, 4, 8], [2], [13]]
    ]
    

    E.g. the first line means that at the beginning of the chart, T-Rex is a lone, the three Raptors are together, Malcolm is alone, Grant and Sattler are together, etc. The second to last event means that two of the Raptors die.

    How exactly you expect the input is up to you, as long as this kind of information can be specified. E.g. you may use any convenient list format. You can also expect the characters in the events to be the full character names again etc.

    You may (but don't have to) assume that each list of groups contains each living character in exactly one group. However, you should not assume that the groups or characters within one event are in particularly convenient order.

  • Render to screen or file (as a vector or raster graphic) a chart which has one line for each character. Each line must be labelled with a character name at the beginning of the line.

  • For each normal event, there must be, in order, some cross-section of the chart in which the groups of characters are clearly resembled by proximity of their respective lines.
  • For each death event, the lines of the relevant characters must terminate in a visible blob.
  • You do not have to reproduce any other features of Randall's plots, nor do you have to reproduce his drawing style. Straight lines with sharp turns, all in black, without further labels and a title is perfectly fine to enter the competition. There's also no need to use the space efficiently - e.g. you could potentially simplify you algorithm by only ever moving lines downwards to meet up with other characters, as long as there is a discernible direction of time.

I've added a reference solution which fulfils exactly these minimum requirements.

Making it Pretty

This is a popularity contest though, so on top of that, you may implement whatever fanciness you want. The most important addition is a decent layouting algorithm which makes the chart more legible - e.g. which makes bends in the lines easy to follow and which reduces the number of necessary line crossings. This is the core algorithmic problem of this challenge! The votes will decide how well your algorithm performs at keeping the chart tidy.

But here are some more ideas, most of them based on Randall's charts:

Decorations:

  • Coloured lines.
  • A title for the plot.
  • Labelling line ends.
  • Automatically relabelling lines which have gone through a busy section.
  • Hand-drawn style (or other? as I said, there's no need to reproduce Randall's style if you have a better idea) for lines and fonts.
  • Customisable orientation of the time axis.

Additional Expressiveness:

  • Named events/groups/deaths.
  • Disappearing and reappearing lines.
  • Characters entering late.
  • Highlights which indicate (transferable?) properties of characters (e.g., see the ringbearer in the LotR chart).
  • Encoding additional information in the grouping axis (e.g. geographic information like in the LotR chart).
  • Time travel?
  • Alternative realities?
  • A character turning into another?
  • Two characters merging? (A character splitting?)
  • 3D? (If you really go that far, please make sure that you're actually using the additional dimension to visualise someting!)
  • Any other relevant features, which could be useful to visualise the narrative of a film (or book etc.).

Of course, many of these will require additional input, and you're free to augment your input format as necessary, but please document how data can be entered.

Please include one or two examples to show off the features you implemented.

Your solution should be able to deal with any valid input, but it's absolutely fine if it's better suited to certain kinds of narratives than others.

Voting Criteria

I have no illusions that I could tell people how they should spend their votes, but here are some suggested guidelines in order of importance:

  • Downvote answers which exploit loopholes, standard ones or others, or hardcode one or more results.
  • Do not upvote answers which don't fulfil the minimum requirements (no matter how fancy the rest might be).
  • First and foremost, upvote nice layouting algorithms. This includes answers which don't use a lot of vertical space while minimising crossing of lines to keep the graph legible, or which manage to encode additional information into the vertical axis. Visualising the groupings without making a huge mess should be the main focus of this challenge, such that this remains a programming contest with an interesting algorithmic problem at heart.
  • Upvote optional features which add expressive power (i.e. are not just pure decoration).
  • Lastly, upvote nice presentation.

Martin Ender

Posted 2014-10-11T23:01:08.770

Reputation: 184 808

Related research. – Martin Ender – 2016-06-01T15:06:07.960

7because code-golf doesn't have enough xkcd – proud haskeller – 2014-10-11T23:21:12.737

8@proudhaskeller PPCG can never have enough xkcd. ;) But I don't think we've tried to make challenges of his supersized info graphics/visualisations yet, so I hope I'm bringing something new to the table with this. And I'm sure some of the others would make very different and interesting challenges as well. – Martin Ender – 2014-10-11T23:23:18.777

Is it OK if my solution only handles 12 angry men, Duel (Spielberg, 1971, regular motorist vs crazed trucker), and Planes, Trains and Automobiles ? ;-) – Level River St – 2014-10-12T09:27:44.117

@steveverrill Let the votes decide. :P But seriously, no your solution should be able to deal with any input, but I'd say it's fine if it performs better on some kinds of narratives than others, as long as that isn't due to you hardcoding three movies and using the reference implementation for any other input. ;) – Martin Ender – 2014-10-12T09:31:08.860

4i wonder how the input for primer would look like... – Joshua – 2014-10-13T13:01:24.127

Does the example input (and the example answer) allow for a difference between all characters being together at once and all characters dying at once? Sorry, I don't know mathematica and amn't enough of a pure computer scientist to appreciate whether there's a difference between [x,y,z] and [[x,y,z]] and wasn't sure whether the only defining feature of the death event is that it contains only one set of characters. – ping – 2014-10-15T19:07:10.240

1@ping Yes, that was the idea. If an event contains further lists, it's a list groupings. so [[x,y,z]] would mean that all characters are currently together. But if the event doesn't contain lists, but only characters directly, it's a death even, so in the same situation [x,y,z] means that those three characters die. Feel free to use another format, with an explicit indication of whether something is a death or grouping event if that helps you. The above format is only a suggestion. As long as your input format is at least as expressive, you can use something else. – Martin Ender – 2014-10-15T19:10:33.623

Answers

18

Python3 with numpy, scipy and matplotlib

Jurassic Park

edit:

  • I tried to keep the groups in the same relative position between events, hence the sorted_event function.
  • New function to calculate the y position of the characters (coords).
  • Every alive event is plotted two times now, so the characters stick together better.
  • Added legend and removed axes label.
import math
import numpy as np
from scipy.interpolate import interp1d
from matplotlib import cm, pyplot as plt


def sorted_event(prev, event):
    """ Returns a new sorted event, where the order of the groups is
    similar to the order in the previous event. """
    similarity = lambda a, b: len(set(a) & set(b)) - len(set(a) ^ set(b))
    most_similar = lambda g: max(prev, key=lambda pg: similarity(g, pg))
    return sorted(event, key=lambda g: prev.index(most_similar(g)))


def parse_data(chars, events):
    """ Turns the input data into 3 "tables":
    - characters: {character_id: character_name}
    - timelines: {character_id: [y0, y1, y2, ...],
    - deaths: {character_id: (x, y)}
    where x and y are the coordinates of a point in the xkcd like plot.
    """
    characters = dict(enumerate(chars))
    deaths = {}
    timelines = {char: [] for char in characters}

    def coords(character, event):
        for gi, group in enumerate(event):
            if character in group:
                ci = group.index(character)
                return (gi + 0.5 * ci / len(group)) / len(event)
        return None

    t = 0
    previous = events[0]
    for event in events:
        if isinstance(event[0], list):
            previous = event = sorted_event(previous, event)
            for character in [c for c in characters if c not in deaths]:
                timelines[character] += [coords(character, event)] * 2
            t += 2
        else:
            for char in set(event) - set(deaths):
                deaths[char] = (t-1, timelines[char][-1])

    return characters, timelines, deaths


def plot_data(chars, timelines, deaths):
    """ Draws a nice xkcd like movie timeline """

    plt.xkcd()  # because python :)

    fig = plt.figure(figsize=(16,8))
    ax = fig.add_subplot(111)
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    ax.set_xlim([0, max(map(len, timelines.values()))])

    color_floats = np.linspace(0, 1, len(chars))
    color_of = lambda char_id: cm.Accent(color_floats[char_id])

    for char_id in sorted(chars):
        y = timelines[char_id]
        f = interp1d(np.linspace(0, len(y)-1, len(y)), y, kind=5)
        x = np.linspace(0, len(y)-1, len(y)*10)
        ax.plot(x, f(x), c=color_of(char_id))

    x, y = zip(*(deaths[char_id] for char_id in sorted(deaths)))
    ax.scatter(x, y, c=np.array(list(map(color_of, sorted(deaths)))), 
               zorder=99, s=40)

    ax.legend(list(map(chars.get, sorted(chars))), loc='best', ncol=4)
    fig.savefig('testplot.png')


if __name__ == '__main__':
    chars = [
        "T-Rex","Raptor","Raptor","Raptor","Malcolm","Grant","Sattler",
        "Gennaro","Hammond","Kids","Muldoon","Arnold","Nedry","Dilophosaurus"
    ]
    events = [
        [[0],[1,2,3],[4],[5,6],[7,8,10,11,12],[9],[13]],
        [[0],[1,2,3],[4,7,5,6,8,9,10,11,12],[13]],
        [[0],[1,2,3],[4,7,5,6,8,9,10],[11,12],[13]],
        [[0],[1,2,3],[4,7,5,6,9],[8,10,11,12],[13]],
        [[0,4,7],[1,2,3],[5,9],[6,8,10,11],[12],[13]],
        [7],
        [[5,9],[0],[4,6,10],[1,2,3],[8,11],[12,13]],
        [12],
        [[0,5,9],[1,2,3],[4,6,10,8,11],[13]],
        [[0],[5,9],[1,2],[3,11],[4,6,10,8],[13]],
        [11],
        [[0],[5,9],[1,2,10],[3,6],[4,8],[13]],
        [10],
        [[0],[1,2,9],[5,6],[3],[4,8],[13]],
        [[0],[1],[9,5,6],[3],[4,8],[2],[13]],
        [[0,1,9,5,6,3],[4,8],[2],[13]],
        [1,3],
        [[0],[9,5,6,3,4,8],[2],[13]]
    ]
    plot_data(*parse_data(chars, events))

pgy

Posted 2014-10-11T23:01:08.770

Reputation: 830

Hah, very nice xkcd look :)... any chance you'll be able to label the lines? – Martin Ender – 2014-12-17T21:06:15.810

Label the lines, have different width of lines (with decreasing/increasing between some points) and finally ... make the lines more horizontal when near to a vertex while interpolating, more like a bezier curve and this would be the best entry IMO :) – Optimizer – 2014-12-17T21:09:31.937

1Thanks, but xkcd style is included in matplotlib, so it was only a function call:) Well, I created a legend, but it occupied almost a third of the image, so I commented it out. – pgy – 2014-12-17T21:10:43.750

I modified my answer, I think it looks better now. – pgy – 2014-12-23T13:55:20.840

6

T-SQL

I'm not happy with this as an entry, but I think this question deserves at least a try at it. I will try to improve this later time permitting, but labeling will always be a problem in SQL. The solution requires SQL 2012+ and is run in SSMS (SQL Server Management Studio). The output is in the spatial results tab.

-- Variables for the input
DECLARE @actors NVARCHAR(MAX) = '["T-Rex", "Raptor", "Raptor", "Raptor", "Malcolm", "Grant", "Sattler", "Gennaro", "Hammond", "Kids", "Muldoon", "Arnold", "Nedry", "Dilophosaurus"]';
DECLARE @timeline NVARCHAR(MAX) = '
[
   [[1], [2, 3, 4], [5], [6, 7], [8, 9, 11, 12, 13], [10], [14]],
   [[1], [2, 3, 4], [5, 8, 6, 7, 9, 10, 11, 12, 13], [14]],
   [[1], [2, 3, 4], [5, 8, 6, 7, 9, 10, 11], [12, 13], [14]],
   [[1], [2, 3, 4], [5, 8, 6, 7, 10], [9, 11, 12, 13], [14]],
   [[1, 5, 8], [2, 3, 4], [6, 10], [7, 9, 11, 12], [13], [14]],
   [8],
   [[6, 10], [1], [5, 7, 11], [2, 3, 4], [9, 12], [13, 14]],
   [13],
   [[1, 6, 10], [2, 3, 4], [5, 7, 11, 9, 12], [14]],
   [[1], [6, 10], [2, 3], [4, 12], [5, 7, 11, 9], [14]],
   [12],
   [[1], [6, 10], [2, 3, 11], [4, 7], [5, 9], [14]],
   [11],
   [[1], [2, 3, 10], [6, 7], [4], [5, 9], [14]],
   [[1], [2], [10, 6, 7], [4], [5, 9], [3], [14]],
   [[1, 2, 10, 6, 7, 4], [5, 9], [3], [14]],
   [2, 4],
   [[1], [10, 6, 7, 5, 9], [3], [14]]
]
';

-- Populate Actor table
WITH actor(A) AS ( SELECT CAST(REPLACE(STUFF(REPLACE(REPLACE(@actors,', ',','),'","','</a><a>'),1,2,'<a>'),'"]','</a>') AS XML))
SELECT ROW_NUMBER() OVER (ORDER BY(SELECT \)) ActorID, a.n.value('.','varchar(50)') Name
INTO Actor
FROM actor CROSS APPLY A.nodes('/a') as a(n);

-- Populate Timeline Table
WITH Seq(L) AS (
    SELECT CAST(REPLACE(REPLACE(REPLACE(REPLACE(@timeline,'[','<e>'),']','</e>'),'</e>,<e>','</e><e>'),'</e>,','</e>') AS XML)
    ),
    TimeLine(N,Exerpt,Elem) AS (
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N
        ,z.query('.')
        ,CAST(REPLACE(CAST(z.query('.') AS VARCHAR(MAX)),',','</e><e>') AS XML)
    FROM Seq 
        CROSS APPLY Seq.L.nodes('/e/e') AS Z(Z)
    ),
    Groups(N,G,Exerpt) AS (
    SELECT N, 
        ROW_NUMBER() OVER (PARTITION BY N ORDER BY CAST(SUBSTRING(node.value('.','varchar(50)'),1,ISNULL(NULLIF(CHARINDEX(',',node.value('.','varchar(50)')),0),99)-1) AS INT)), 
        CAST(REPLACE(CAST(node.query('.') AS VARCHAR(MAX)),',','</e><e>') AS XML) C
    FROM TimeLine 
        CROSS APPLY Exerpt.nodes('/e/e') as Z(node)
    WHERE Exerpt.exist('/e/e') = 1
    )
SELECT * 
INTO TimeLine
FROM (
    SELECT N, null G, null P, node.value('.','int') ActorID, 1 D 
    FROM TimeLine CROSS APPLY TimeLine.Elem.nodes('/e') AS E(node)
    WHERE Exerpt.exist('/e/e') = 0
    UNION ALL
    SELECT N, G, DENSE_RANK() OVER (PARTITION BY N, G ORDER BY node.value('.','int')), node.value('.','int') ActorID, 0
    FROM Groups CROSS APPLY Groups.Exerpt.nodes('/e') AS D(node)
    ) z;

-- Sort the entries again
WITH ReOrder AS (
            SELECT *, 
                ROW_NUMBER() OVER (PARTITION BY N,G ORDER BY PG, ActorID) PP, 
                COUNT(P) OVER (PARTITION BY N,G) CP, 
                MAX(G) OVER (PARTITION BY N) MG, 
                MAX(ActorID) OVER (ORDER BY (SELECT\)) MA
            FROM (
                SELECT *,
                    LAG(G,1) OVER (PARTITION BY ActorID ORDER BY N) PG,
                    LEAD(G,1) OVER (PARTITION BY ActorID ORDER BY N) NG
                FROM timeline
                ) rg
    )
SELECT * INTO Reordered
FROM ReOrder;
ALTER TABLE Reordered ADD PPP INT
GO
ALTER TABLE Reordered ADD LPP INT
GO
WITH U AS (SELECT N, P, LPP, LAG(PP,1) OVER (PARTITION BY ActorID ORDER BY N) X FROM Reordered)
UPDATE U SET LPP = X FROM U;
WITH U AS (SELECT N, ActorID, P, PG, LPP, PPP, DENSE_RANK() OVER (PARTITION BY N,G ORDER BY PG, LPP) X FROM Reordered)
UPDATE U SET PPP = X FROM U;
GO

SELECT Name, 
    Geometry::STGeomFromText(
        STUFF(LS,1,2,'LINESTRING (') + ')'
        ,0)
        .STBuffer(.1)
        .STUnion(
        Geometry::STGeomFromText('POINT (' + REVERSE(SUBSTRING(REVERSE(LS),1,CHARINDEX(',',REVERSE(LS))-1)) + ')',0).STBuffer(D*.4)
        )
FROM Actor a
    CROSS APPLY (
        SELECT CONCAT(', '
            ,((N*5)-1.2)
                ,' ',(G)+P
            ,', '
            ,((N*5)+1.2)
                ,' ',(G)+P 
            ) AS [text()]
        FROM (
            SELECT ActorID, N,
                CASE WHEN d = 1 THEN
                    ((MA+.0) / (LAG(MG,1) OVER (PARTITION BY ActorID ORDER BY N)+.0)) * 
                    PG * 1.2
                ELSE 
                    ((MA+.0) / (MG+.0)) * 
                    G * 1.2
                END G,
                CASE WHEN d = 1 THEN
                (LAG(PPP,1) OVER (PARTITION BY ActorID ORDER BY N) -((LAG(CP,1) OVER (PARTITION BY ActorID ORDER BY N)-1)/2)) * .2 
                ELSE
                (PPP-((CP-1)/2)) * .2 
                END P
                ,PG
                ,NG
            FROM Reordered
            ) t
        WHERE a.actorid = t.actorid
        ORDER BY N, G
        FOR XML PATH('')
        ) x(LS)
    CROSS APPLY (SELECT MAX(D) d FROM TimeLine dt WHERE dt.ActorID = a.ActorID) d
GO

DROP TABLE Actor;
DROP TABLE Timeline;
DROP TABLE Reordered;

The resulting timeline looks like the following enter image description here

MickyT

Posted 2014-10-11T23:01:08.770

Reputation: 11 735

4

Mathematica, Reference Solution

For reference, I provide a Mathematica script which fulfils exactly the minimum requirements, nothing more, nothing less.

It expects the characters to be a list of the format in the question in chars, and the events in events.

n = Length@chars;
m = Max@Map[Length, events, {2}];
deaths = {};
Graphics[
 {
  PointSize@Large,
  (
     linePoints = If[Length@# == 3,
         lastPoint = {#[[1]], #[[2]] + #[[3]]/(m + 2)},
         AppendTo[deaths, Point@lastPoint]; lastPoint
         ] & /@ Position[events, #];
     {
      Line@linePoints,
      Text[chars[[#]], linePoints[[1]] - {.5, 0}]
      }
     ) & /@ Range@n,
  deaths
  }
 ]

As an example, here is the Jurassic Park example using Mathematica's list type:

chars = {"T-Rex", "Raptor", "Raptor", "Raptor", "Malcolm", "Grant", 
   "Sattler", "Gennaro", "Hammond", "Kids", "Muldoon", "Arnold", 
   "Nedry", "Dilophosaurus"};
events = {
   {{1}, {2, 3, 4}, {5}, {6, 7}, {8, 9, 11, 12, 13}, {10}, {14}},
   {{1}, {2, 3, 4}, {5, 8, 6, 7, 9, 10, 11, 12, 13}, {14}},
   {{1}, {2, 3, 4}, {5, 8, 6, 7, 9, 10, 11}, {12, 13}, {14}},
   {{1}, {2, 3, 4}, {5, 8, 6, 7, 10}, {9, 11, 12, 13}, {14}},
   {{1, 5, 8}, {2, 3, 4}, {6, 10}, {7, 9, 11, 12}, {13}, {14}},
   {8},
   {{6, 10}, {1}, {5, 7, 11}, {2, 3, 4}, {9, 12}, {13, 14}},
   {13},
   {{1, 6, 10}, {2, 3, 4}, {5, 7, 11, 9, 12}, {14}},
   {{1}, {6, 10}, {2, 3}, {4, 12}, {5, 7, 11, 9}, {14}},
   {12},
   {{1}, {6, 10}, {2, 3, 11}, {4, 7}, {5, 9}, {14}},
   {11},
   {{1}, {2, 3, 10}, {6, 7}, {4}, {5, 9}, {14}},
   {{1}, {2}, {10, 6, 7}, {4}, {5, 9}, {3}, {14}},
   {{1, 2, 10, 6, 7, 4}, {5, 9}, {3}, {14}},
   {2, 4},
   {{1}, {10, 6, 7, 4, 5, 9}, {3}, {14}}
};

we'll get:

enter image description here

(Click for larger version.)

That doesn't look too bad, but that's mostly because the input data is more or less ordered. If we shuffle the groups and characters in each event (while maintaining the same structure), stuff like this can happen:

enter image description here

Which is a bit of a mess.

So as I said, this fulfils only the minimum requirements. It doesn't try to find a nice layout and it isn't pretty, but that's where you guys come in!

Martin Ender

Posted 2014-10-11T23:01:08.770

Reputation: 184 808

I just thought you can perhaps 'prettify' it by using quadratic or cubic splines in order to remove the sharp corners? (I'd do it that way that the tangent at the given points is always 0) – flawr – 2014-10-12T08:51:22.330

@flawr Sure, or I could apply some of these trick, but that wasn't the purpose of this answer. ;) I really just wanted to provide a reference for the absolute minimum.

– Martin Ender – 2014-10-12T08:55:25.620

3Oh sorry, didn't even notice that this was your own question=P – flawr – 2014-10-12T11:38:24.540