Friday, April 21, 2017

kilo: I've started my own version of kilo

Based on the comments I've made so far, I've started my own version of kilo, which you can find here if you are interested.

I've chosen to make some different design and stylistic choices from the previous versions. Most notably, I just can't restrict myself to a single source file. Even for a relatively small project like this, it's just too claustrophobic. I've also chosen the Unix style of underscores in identifiers (editor_process_keypress) rather than camel case (editorProcessKeypress), which feels anachronistic to me in the C/Unix environment.

I've also created a simple Perl script to help deal with the string constant issue I've mentioned previously. This allows arcane terminal escape sequences to be named symbolically, with their length computed at compile time. I expect a decent C optimizer will generate very similar code to the original inline versions.

I've also renamed struct abuf to the typedef-ed term_buffer, so the erstwhile editorRefreshScreen looks like this in my implementation:

void editor_refresh_screen()
{
    term_buffer tb;
    tb_init(&tb);
    
    string_const home_cursor = get_home_cursor_str();

    tb_append_str(&tb, get_cursor_off_str());
    tb_append_str(&tb, home_cursor);

    editor_draw_rows(&tb);

    tb_append_str(&tb, home_cursor);
    tb_append_str(&tb, get_cursor_on_str());
    
    tb_write(&tb);
    tb_free(&tb);
}

The functions get_home_cursor_str(), get_cursor_off_str(), and so on are automatically generated by my Perl script linked above. Notice how much nicer this is than the version shown in step 40:

void editorRefreshScreen() {
  struct abuf ab = ABUF_INIT;

  abAppend(&ab, "\x1b[?25l", 6);
  abAppend(&ab, "\x1b[2J", 4);
  abAppend(&ab, "\x1b[H", 3);

  editorDrawRows(&ab);

  abAppend(&ab, "\x1b[H", 3);
  abAppend(&ab, "\x1b[?25h", 6);

  write(STDOUT_FILENO, ab.b, ab.len);
  abFree(&ab);
}

Thursday, April 13, 2017

kilo commentary: Wecome message (steps 41 and 42)

Steps 41 and 42 add a welcome message to our nascent editor implementation. It starts with another #define:

#define KILO_VERSION "0.0.1"

As I mentioned before, I'm not a fan of using the preprocessor when it can be avoided, and there's no reason not to declare this as a global character constant. But this is just awful:

void editorDrawRows(struct abuf *ab) {
  int y;
  for (y = 0; y < E.screenrows; y++) {
    if (y == E.screenrows / 3) {
      char welcome[80];
      int welcomelen = snprintf(welcome, sizeof(welcome),
        "Kilo editor -- version %s", KILO_VERSION);
      if (welcomelen > E.screencols) welcomelen = E.screencols;
      int padding = (E.screencols - welcomelen) / 2;
      if (padding) {
        abAppend(ab, "~", 1);
        padding--;
      }
      while (padding--) abAppend(ab, " ", 1);
      abAppend(ab, welcome, welcomelen);
    } else {
      abAppend(ab, "~", 1);
    }
    abAppend(ab, "\x1b[K", 3);
    if (y < E.screenrows - 1) {
      abAppend(ab, "\r\n", 2);
    }
  }
}

In a function called editorDrawRows fully half of the implementation is taken up with displaying the welcome message. We also see again the "reversed if" control flow organization:

if (condition)
  rare case
else
  usual case

I assume we'll see a lot of changes to this function as we go, but for now the implementation is greatly improved by reversing the ordering of the if statement, and moving the code to write the welcome message to its own function.

void editorDrawRows(struct abuf *ab) {
  int y;
  for (y = 0; y < E.screenrows; y++) {
    if (y != E.screenrows / 3) {
      abAppend(ab, "~", 1);
    } else {
      addWelcomeMessage(ab);
    }

    abAppend(ab, "\x1b[K", 3); /* erase to end of line */
    if (y < E.screenrows - 1) {
      abAppend(ab, "\r\n", 2);
    }
  }
}

Wednesday, April 12, 2017

Windows 7 just crashed on me

Total BSOD, sent to reboot land. Not the first time this has happened, either. I don't remember XP ever crashing on me like this.

This is not the 21st century I was promised.

Update: as noted above, this turns out to be a hardware problem.

Monday, April 10, 2017

kilo commentary: Append Buffer (steps 36, 37, 38)

Starting with step 36, an "append buffer" implementation is presented, starting with these declarations:

struct abuf {
  char *b;
  int len;
};
#define ABUF_INIT {NULL, 0}

First of all, abuf is a exceedingly short and non-specific name to be putting in the global namespace. I know this is meant to be a relatively short program, but it's an old proverb in software development that big programs start their lives as small programs. I'm also not a huge fan of using the preprocessor to abstract away the initialization of a struct abuf value (I really don't like using the preprocessor at all when it can be avoided.) A compiler with a decent optimizer should generate comparable code for a call to something like this:

static inline void abInit(struct abuf* ab) {
  ab->b = NULL;
  ab->len = 0;
}

Another question I have about this implementation is, why use a dynamic buffer at all? As is shown in later steps, this is used to buffer output to the screen, instead of writing lots of short strings one at a time. It remains to be seen (by me, anyway) if this is its only use this data structure has, but it seems to me a relatively small fixed-size buffer would serve as well, without putting load on the dynamic allocation system:

struct abuf {
  char b[1024];
  int len;
};

I'd also point out that I'd really like to see clearing of the struct abuf fields in abFree:

void abFree(struct abuf& ab)
{
  free(ab->b);
  ab->b = NULL;
  ab->len = 0;
}

Leaving these with their old values is another bug waiting to happen. Finally, I'd add these utility abstraction functions to clarify the calling code:

static inline void abAppendStr(struct abuf* ab, const char* string) {
  abAppend(ab, string, strlen(string));
}

static inline void abWrite(struct sbuf* ab) {
  write(STDOUT_FILENO, ab->b, ab->len);
}

Note the use of static inline, so this adds a very helpful abstraction layer with zero cost at runtime.

Sunday, April 9, 2017

kilo commentary: getCursorPosition (step 33)

In step 33, this version of the getCursorPosition function is presented:

int getCursorPosition(int *rows, int *cols) {
  char buf[32];
  unsigned int i = 0;
  if (write(STDOUT_FILENO, "\x1b[6n", 4) != 4) return -1;
  while (i < sizeof(buf) - 1) {
    if (read(STDIN_FILENO, &buf[i], 1) != 1) break;
    if (buf[i] == 'R') break;
    i++;
  }
  buf[i] = '\0';
  if (buf[0] != '\x1b' || buf[1] != '[') return -1;
  if (sscanf(&buf[2], "%d;%d", rows, cols) != 2) return -1;
  return 0;
}

This is more of a minor stylistic issue compared to the previous stuff I've commented on, but I prefer to write loops like this using for:

int getCursorPosition(int *rows, int *cols) {
  if (write(STDOUT_FILENO, "\x1b[6n", 4) != 4) return -1;

  unsigned int i = 0;
  char buf[32];
  for (i = 0; i < sizeof(buf) - 1; i++) {
    if (read(STDIN_FILENO, &buf[i], 1) != 1) break;
    if (buf[i] == 'R') break;
  }
  buf[i] = '\0';

  if (buf[0] != '\x1b' || buf[1] != '[') return -1;
  if (sscanf(&buf[2], "%d;%d", rows, cols) != 2) return -1;
  return 0;
}

In this version, the initialization of i = 0 is contained within the loop, and the loop structure (IMO, anyway) is clearer. Whenever a loop is iterating on an integer variable with a hard upper bound, I just feel like a for loop makes it clearer what's happening. Also, since this project explicitly uses C99, there's no reason declare variables until they're actually needed.

Saturday, April 8, 2017

kilo commentary: getWindowSize (step 30)

In step 30 of implementing the kilo editor, this version of the getWindowSize function is presented:

int getWindowSize(int *rows, int *cols) {
  struct winsize ws;

  /* Note that the "1 ||" in the condition is deliberate, to force execution of the error arm of the if. */
  if (1 || ioctl(STDOUT_FILENO, TIOCGWINSZ, &ws) == -1 || ws.ws_col == 0) {
    if (write(STDOUT_FILENO, "\x1b[999C\x1b[999B", 12) != 12) return -1;
    editorReadKey();
    return -1;
  } else {
    *cols = ws.ws_col;
    *rows = ws.ws_row;
    return 0;
  }
}

This is a really ugly function, particularly considering how short it is. (For what it's worth, this is also where I got in reading through the kilo implementation when I decided I wanted to start blogging my reactions to it.)

First of all, there's another magic string and constant here, "\x1b[999C\x1b[999B" and this is definitely a place where it really feels like a bug waiting to happen. Another problem is that the outer if statement is structured backwards:

if (try something) {
  handle error case
} else {
  handle success case
}

I think it's best to keep the success case code close to the test, so having the error case code in between becomes problematic, especially as and when the error code becomes more complex. A second, related issue is that the error case code is essentially (the start of) an alternate strategy for retrieving the terminal dimensions, nested within the code for the first strategy. Notice how much nicer this reads with just a bit of refactoring:

int getWindowSize(int *rows, int* cols) {
  struct winsize ws;
  /* Note the De Morgan-ized condition to move the success condition to the "then" part of the if. */
  if (0 && ioctl(STDOUT_FILENO, TIOCGWINSZ, &ws) != -1 && ws.ws_col != 0) {
    *cols = ws.ws_col;
    *rows = ws.ws_row;
    return 0;
  }

  if (write(STDOUT_FILENO, "\x1b[999C\x1b[999B", 12) == 12) {
    editorReadKey();
  }

  return -1;
}

Now the two strategies for determining the terminal window size are separated, and the success code is right next to the test in both cases. Also note how much clearer it is here that the second strategy is incomplete, largely because it is no longer hidden inside the logic of the first strategy. Fix up the magic strings and numbers, and this code is vastly improved over the original presentation. Also note we appear to be doing a partial re-implementation of ncurses...

kilo commentary: editorDrawRows (step 29)

In step 25 and step 29 of implementing the kilo editor, a preliminary implementation of the function editorDrawRows is introducted:

void editorDrawRows() {
  int y;
  for (y = 0; y < E.screenrows; y++) {
    write(STDOUT_FILENO, "~\r\n", 3);
  }
}

This is meant to display a column of tildes on the left side of the screen, one on each row. But there's a bug, do you see what it is?

After writing out the final tide on the bottom most line of the terminal, it also outputs \r\n, carriage return line feed sequence, scrolling the screen. Of course, this is a major no-no for apps that want to control the screen as an editor like this does. It's not a huge deal in this very preliminary version of the code, and I assume it will be fixed later as we go, but it's something that bugged me when I noticed it.

The other (IMO more serious) problem with this (and other code in the kilo implementation) is the use of magic strings ("~\r\n") and numbers (3). This isn't that big of a deal in this tiny case, but as we'll see in my next post, the string constants aren't going to stay 3 characters long. A partial fix would be to introduce a simple wrapper function around write:

static inline ssize_t terminalWrite(const char* string)
{
  size_t sl = strlen(string);
  if ((size_t)write(STDOUT_FILENO, string, sl) == sl)
    return sl;
  return -1;
}

This will add the overhead of calling strlen every time a string is written, but particularly in an interactive environment, this should be a very minor issue. The magic string problem could be fixed by creating a list of global constants:

const char* g_tilde_rn = "~\r\n";
const char* g_tilde = "~";

We can then rewrite the function as follows:

void editorDrawRows() {
  int y;
  for (y = 0; y < E.screenrows; y++) {
    terminalWrite(y+1 < E.screenrows ? g_tilde_rn : g_tilde);
  }
}

Update: the extra \r\n bug is addressed in Step 35. This is a classic example of the fencepost version of the off-by-one type of bug.

kilo commentary: editorProcessKeypress (step 21)

In part 3 of Build Your Own Text Editor, the following preliminary implementation is presented for a function called editorProcessKeypress:

void editorProcessKeypress() {
  char c = editorReadKey();
  switch (c) {
    case CTRL_KEY('q'):
      exit(0);
      break;
  }
}

It's a very short function, but it's already got a problem: it's doing two different things: 1) reading input from the user, and 2) processing that input. If you're not seeing the problem, imagine if you wanted to use the editor in a scripting environment to modify a file. As it's written above, reading the input from the terminal is a part of editorProcessKeypress. Fortunately, this is easy enough to fix, just move the call to editorReadKey up one level, and pass the result in as a parameter:

void editorProcessKeypress(char c) {
  switch (c) {
    case CTRL_KEY('q'):
      exit(0);
      break;
  }
}

/* ... */

   editorProcessKeypress(editorReadKey());

Note how this also gives us a couple of functions that can be composed together in a nicely satisfying way. Discussion of Curly's Law.

Build Your Own Text Editor!

I was reading Hacker News a while back, and came across this link on how to code a simple text editor. This seemed pretty interesting, so I clicked over, spun up an Ubuntu VM, and started following along.

Almost immediately I noticed the coding style was somewhat problematic (at least to my eye), and so one of the things I want to do in this blog is document my thoughts about how the could can be made better.

Perhaps the first question I had was why the ncurses library was not used for terminal I/O. The author does mention this briefly in the third section of the article:

If we wanted to support the maximum number of terminals out there, we could use the ncurses library, which uses the terminfo database to figure out the capabilities of a terminal and what escape sequences to use for that particular terminal.

He doesn't offer any justification for avoiding ncurses, and we are instead simply plunged headlong into the rather arcane world of ioctl and terminal escape sequences. antirez, the original author, does say that his goal was to write:

...a text editor in less than 1000 lines of code that does not depend on ncurses...

Avoiding dependencies to third party libraries is a reasonable enough goal, I suppose. But I think it's at least worth mentioning the fact that avoiding ncurses adds extra inertia to the project.

I should also add by way of disclaimer, that this (as well as future posts about kilo or other software) is not meant as criticism on the various authors involved. They had their own goals in mind when they wrote their code, and I'll let them speak for themselves as to how well they achieved them.

My purpose is to work out my own thoughts on how best to write code that is clear, correct, and easy to modify. If you get something out of it too, that's on the bonus side.