Fast line drawing algorithm

9

The task is to find a way to draw a horizontal line in an array of 16-bit integers.

We're assuming a 256x192 pixel array with 16 pixels per word. A line is a contiguous run of set (1) bits. Lines can start in the middle of any word, overlap any other words, and end in any word; they may also start and end in the same word. They may not wrap on to the next line. Hint: the middle words are easy - just write 0xffff, but the edges will be tricky, as will handling the case for the start and end in the same word. A function/procedure/routine must take an x0 and x1 coordinate indicating the horizontal start and stop points, as well as a y coordinate.

I exclude myself from this because I designed a nearly identical algorithm myself for an embedded processor but I'm curious how others would go about it. Bonus points for using relatively fast operations (for example, a 64 bit multiply or floating point operation wouldn't be fast on an embedded machine but a simple bit shift would be.)

Thomas O

Posted 2011-01-27T22:38:16.347

Reputation: 3 044

@Nakilon: I disagree. Then why is this site named Code Golf? There are thousands of other sites for algorithmic complexity and speed optimisation discussions. – hallvabo – 2011-01-28T09:06:41.080

5@hallvabo: From the FAQ - "Code Golf - Stack Exchange is for code golfers and for those who interested in code golfing (from beginners to experts), and programming puzzles." I consider this a programming puzzle. – Thomas O – 2011-01-28T11:00:20.750

2Codegolf is about short code, not fast code or optimizing for speed. – hallvabo – 2011-01-27T23:32:27.167

@hallvabo My solution is quite short, about 5 lines when bounds checking and additional features (like toggling pixels instead of setting them.) are removed. – Thomas O – 2011-01-27T23:39:36.130

I started a discussion about performance measurements on meta - you're invited.

– user unknown – 2011-08-13T22:59:00.380

how was winner chosen? – oenone – 2011-08-15T13:14:07.380

Bounty of 50 added by me. – Thomas O – 2011-02-05T01:58:27.940

@hallvabo See http://meta.codegolf.stackexchange.com/questions/5/site-title-and-domain

– marcog – 2011-02-05T08:50:19.037

9@hallvabo, this site not only codegolf. Is also about optimizing for speed, but not all kinds of optimizing: not hardware details, but algorithm's complexity. – Nakilon – 2011-01-28T01:22:23.457

I agree with Nakilon and Thomas that the site is not restricted to golf, but I have a little bit of a problem with making "speed" the metric. I mean, how will you measure and define "speed"? If you know the input is large, you can look at O() and or \Theta(), but to resolve ties you need an assumption about how fast various operations are. – dmckee --- ex-moderator kitten – 2011-02-05T15:46:44.407

@marcog this is a different issue; we're not sure that performance things are on topic. Has nothing to do with confusion about codegolf. – Jeff Atwood – 2011-02-05T21:34:18.130

@Jeff Replied to your comment on meta. – marcog – 2011-02-05T21:50:06.540

Answers

3

This code assumes that both x0 and x1 are inclusive endpoints, and that words are little endian (i.e. the (0,0) pixel can be set with array[0][0]|=1).

int line(word *array, int x0, int x1, int y) {
  word *line = array + (y << 4);
  word *start = line + (x0 >> 4);
  word *end = line + (x1 >> 4);
  word start_mask = (word)-1 << (x0 & 15);
  word end_mask = (unsigned word)-1 >> (15 - (x1 & 15));
  if (start == end) {
    *start |= start_mask & end_mask;
  } else {
    *start |= start_mask;
    *end |= end_mask;
    for (word *p = start + 1; p < end; p++) *p = (word)-1;
  }
}

Keith Randall

Posted 2011-01-27T22:38:16.347

Reputation: 19 865

1How fast is it? – user unknown – 2011-08-13T22:21:41.993

1

Scala, 7s / 1M lines 4.1s / 1M lines

// declaration and initialisation of an empty field: 
val field = Array.ofDim[Short] (192, 16) 

first implementation:

// util-method: set a single Bit:
def setBit (x: Int, y: Int) = 
  field (y)(x/16) = (field (y)(x/16) | (1 << (15 - (x % 16)))).toShort 
def line (x0: Int, x1: Int, y: Int) = 
  (x0 to x1) foreach (setBit (_ , y))

After eliminating the inner method call, and replacing the for- with a while-loop, on my 2Ghz Single Core with Scala 2.8 it absolves 1 Mio. Lines in 4.1s sec. instead of initial 7s.

  def line (x0: Int, x1: Int, y: Int) = {
    var x = x0
    while (x < x1) {  
      field (y)(x/16) = (field (y)(x/16) | (1 << (15 - (x % 16)))).toShort
      x += 1
    }
  }

Testcode and invocation:

// sample invocation:
line (12, 39, 3) 
// verification 
def shortprint (s: Short) = s.toBinaryString.length match {          
  case 16 => s.toBinaryString                                          
  case 32 => s.toBinaryString.substring (16)                           
  case x  => ("0000000000000000".substring (x) + s.toBinaryString)}

field (3).take (5).foreach (s=> println (shortprint (s)))            
// result:
0000000000001111
1111111111111111
1111111100000000
0000000000000000
0000000000000000

Performance testing:

  val r = util.Random 

  def testrow () {
    val a = r.nextInt (256)
    val b = r.nextInt (256)
    if (a < b)
      line (a, b, r.nextInt (192)) else
        line (b, a, r.nextInt (192)) 
  }

  def test (count: Int): Unit = {
    for (n <- (0 to count))
      testrow ()
  }

  // 1 mio tests
  test (1000*1000) 

Tested with the unix tool time, comparing the user-time, including startup-time, compiled code, no JVM-start-up phase.

Increasing the number of lines shows, that for every new million, it needs an extra 3.3s.

user unknown

Posted 2011-01-27T22:38:16.347

Reputation: 4 210

1

Python

The main trick here is to use a lookup table to store bitmasks of the pixels. This saves a few operations. A 1kB table is not that large even for an embedded platform these days

If space is really tight, for the price of a couple of &0xf the lookup table can be reduced to just 64B

This code is in Python, but would be simple to port to any language that supports bit operations.

If using C, you could consider unwinding the loop using the switch from Duff's device. Since the line is maximum of 16 words wide, I would extend the switch to 14 lines and dispense with the while altogether.

T=[65535, 32767, 16383, 8191, 4095, 2047, 1023, 511,
   255, 127, 63, 31, 15, 7, 3, 1]*16
U=[32768, 49152, 57344, 61440, 63488, 64512, 65024, 65280,
   65408, 65472, 65504, 65520, 65528, 65532, 65534, 65535]*16

def drawline(x1,x2,y):
    y_=y<<4
    x1_=y_+(x1>>4)
    x2_=y_+(x2>>4)
    if x1_==x2_:
        buf[x1_]|=T[x1]&U[x2]
        return    
    buf[x1_]|=T[x1]
    buf[x2_]|=U[x2]        
    x1_+=+1
    while x1_<x2_:
        buf[x1_] = 0xffff
        x1_+=1


#### testing code ####

def clear():
    global buf
    buf=[0]*192*16

def render():
    for y in range(192):
        print "".join(bin(buf[(y<<4)+x])[2:].zfill(16) for x in range(16))


clear()
for y in range(0,192):
    drawline(y/2,y,y)
for x in range(10,200,6):
    drawline(x,x+2,0)
    drawline(x+3,x+5,1)
for y in range(-49,50):
    drawline(200-int((2500-y*y)**.5), 200+int((2500-y*y)**.5), y+60)
render()

gnibbler

Posted 2011-01-27T22:38:16.347

Reputation: 14 170

1

Here is a C version of my Python answer using the switch statement instead of the while loop and reduced indexing by incrementing a pointer instead of the array index

The size of the lookup table can be substantially reduced by using T[x1 & 0xf] and U[x2 & 0xf] for a couple of extra instructions

#include <stdio.h>
#include <math.h>

unsigned short T[] = {0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
                      0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
                      0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
                      0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
                      0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
                      0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
                      0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
                      0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
                      0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
                      0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
                      0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
                      0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
                      0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
                      0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
                      0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
                      0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
                      0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001};

unsigned short U[] = {0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
                      0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
                      0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
                      0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
                      0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
                      0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
                      0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
                      0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
                      0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
                      0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
                      0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
                      0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
                      0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
                      0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
                      0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
                      0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
                      0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff};

unsigned short buf[192*16];

void clear(){
    int i;
    for (i=0; i<192*16; i++) buf[i]==0;
}

void render(){
    int x,y;
    for (y=0; y<192; y++){
        for (x=0; x<256; x++) printf("%d", (buf[(y<<4)+(x>>4)]>>(15-(x&15)))&1);
        printf("\n");
    }
}

void drawline(int x1, int x2, int y){
    int y_ = y<<4;
    int x1_ = y_+(x1>>4);
    int x2_ = y_+(x2>>4);
    unsigned short *p = buf+x1_;

    if (x1_==x2_){
        *p|=T[x1]&U[x2];
        return;
        }

    *p++|=T[x1];
    switch (x2_-x1_){
    case 14: *p++ = 0xffff;
    case 13: *p++ = 0xffff;
    case 12: *p++ = 0xffff;
    case 11: *p++ = 0xffff;
    case 10: *p++ = 0xffff;
    case 9: *p++ = 0xffff;
    case 8: *p++ = 0xffff;
    case 7: *p++ = 0xffff;
    case 6: *p++ = 0xffff;
    case 5: *p++ = 0xffff;
    case 4: *p++ = 0xffff;
    case 3: *p++ = 0xffff;
    case 2: *p++ = 0xffff;
    case 1: *p++ = U[x2];
    }     
}


int main(){
    int x,y;
    clear();

    for (y=0; y<192; y++){
        drawline(y/2,y,y); 
    }

    for (x=10; x<200; x+=6){
        drawline(x,x+2,0);
        drawline(x+3,x+5,1);
    }

    for (y=-49; y<50; y++){
        x = sqrt(2500-y*y);
        drawline(200-x, 200+x, y+60);
    }
    render();
    return 0;
    }

gnibbler

Posted 2011-01-27T22:38:16.347

Reputation: 14 170

How fast is it? – user unknown – 2011-08-13T21:14:37.173

@user unknown, How long is a piece of string? I think it should be faster than the accepted answer because it uses a lookup table to reduce the amount of work slightly. Why don't you try them out and let us know what you find? – gnibbler – 2011-08-14T08:53:49.157