Blog In Tucson: 2018

Sunday, August 12, 2018

kilo milestone: finished chapter 5 (text editor functionality) and chapter 6 (search)

One more chapter to go...

Saturday, July 28, 2018

kilo commentary: changing the interface is bad, especially when you don't need to

Step 122 begins a series of modifications that handles the Enter key inserting a new line. The first step is to rename editorAppendRow to editorInsertRow and add a new parameter.

Now this, in and of itself, is fine. Inserting a line at some arbitrary point in the file is a generalization of appending one at the end, so it makes sense to use the guts of editorAppendRow to create editorInsertRow.

Where things go off the rails (IMO) is in step 123, where we are supposed to go through the code, and modify all the old calls to editorAppendRow to the corresponding editorInsertRow. Now there are only two places to change, so this isn't a huge modification to the codebase. However, I see two problems with it:

It harms the readability of the code, editorAppendRow has a clearer meaning when we're specifically adding a line to the end of a file
It's actually unnecessary

We can create a new implementation of editorAppendRow quite simply:

static inline editorAppendRow(char* line, ssize_t linelen)
{ editorInsertRow(E.numrows, line, linelen); }

Much, much better

Monday, July 23, 2018

kilo commentary: In which an array of text lines turns out to be a bad design choice

Way back in step 55, the step was taken to store the file the editor is working on as an array of lines. Steps 119 through 125 are dealing with joining two lines (deleting at the start of a line) or creating a new line with the Enter key. Because of the array-of-lines data structure chosen many steps ago, these are both relatively complicated operations.

In an editor that treats a file simply as a sequence of characters, these operations are as easy (or hard) as inserting or deleting a character anywhere else in the file.

I am reminded of one of Fred Brooks' aphorisms, "plan to throw one away; you will, anyhow." kilo is not something you really fix so much as work through to identify all the mistakes to avoid the second time through.

Wednesday, July 18, 2018

kilo: The horrible, no good, very bad quit confirmation

Step 115 implements a quit confirmation. That is, the user is prevented from quitting kilo while there are unsaved changes, Well, at first:

    case CTRL_KEY('q'):
      if (E.dirty && quit_times > 0) {
        editorSetStatusMessage("WARNING!!! File has unsaved changes. "
          "Press Ctrl-Q %d more times to quit.", quit_times);
        quit_times--;
        return;
      }

Yeah, that's pretty awful UX right there. Ironically, we're just a few steps away from implementing an editorPrompt function (used to prompt for a filename to save to) which could also be used for saying "You have unsaved changes, really quit? (N/y)", which is how virtually every "real" editor does it.

Saturday, July 7, 2018

kilo commentary: Is this malloc Really Necessary?

Step 105 begins the process of saving the file to disk by mallocing a buffer the size of the entire file and then copying the file's text into it. Step 106 implements the code to write the contents of this buffer to a file.

There's really no good reason to do this. I suppose there's some speed gained by making a single call to write, rather than one for each line in the file. On the other hand, it seems unlikely to offset the cost of yet another dynamic allocation, and copying the entire contents of the file. A buffered writing system might be worth implementing, but this is not the way to do it.

kilo commentary: realloc is not magic

I'm now starting the "make this an actual text editor" part of implementing kilo. The first step, naturally, is to write code that inserts a new character. Here's the implementation of editorRowInsertChar given in step 101:

void editorRowInsertChar(erow *row, int at, int c) {
  if (at < 0 || at > row->size) at = row->size;
  row->chars = realloc(row->chars, row->size + 2);
  memmove(&row->chars[at + 1], &row->chars[at], row->size - at + 1);
  row->size++;
  row->chars[at] = c;
  editorUpdateRow(row);
}

So yeah, hence the title of this post. The call of memmove every time a character is inserted is pretty bad, too. But... well, this is actually probably good enough for a toy editor that no one is actually going to use for real editing work. Still, I wish the author had at least mentioned that this is very much a quick-and-dirty way to do it, and that there's better ways to do it.

Sunday, July 1, 2018

kilo milestone: finished chapter 4

Which is the longest chapter of the booklet, and also marks passing the halfway point (at least in number of steps).

One of the cliché interview questions is, "what weakness do you have?" (or some such). Well, one weakness for me is that I have uncounted projects that I have started, worked on for a bit, and then abandoned long before they were finished, or even really more than started. This kilo project was headed down that same path, when it languished untouched for a year.

I've determined to be more diligent about finishing my projects, or at least bringing them to a reasonable "unhooking point". And so I plan to continue on with the kilo project, at least through the final three chapters, if for no other reason than as an exercise in sticking with a project to the finish.

Monday, June 18, 2018

A bug in the kilo code

Step 85 begins dealing with moving the cursor around tab characters, and step 88 gives the implementation of editorRowCxToRx, which calculates where the cursor should be based on how many tabs it's passed over in the line:

int editorRowCxToRx(erow *row, int cx) {
  int rx = 0;
  int j;
  for (j = 0; j < cx; j++) {
    if (row->chars[j] == '\t')
      rx += (KILO_TAB_STOP - 1) - (rx % KILO_TAB_STOP);
    rx++;
  }
  return rx;
}

Here, the parameter cx is the index into the array of characters of the line text, and the return value row offset of where the cursor should be placed on the screen

The problem comes when moving between two lines. Let's say you've got two lines, something like this:

\t
123456789

Now suppose that the cursor is on the first line, to the right of the tab character. Normally, the cursor will be displayed directly above the '9' character on the second line. Now suppose you press the down arrow to move to the next line, where should the cursor be? Virtually every editor I've ever used will place the cursor on the '9' of the second line.

However, the cx value will still be 1, and so for kilo, the cursor will be placed on the 2.

Maybe this will be fixed later in the project. I'm not going to try to deal with it now (honestly, this whole method of dealing with tabs feels like a botch anyway). Perhaps once I finish working through my first pass over the code.

Tuesday, June 12, 2018

Neal Ford "The Productive Programmer" (2008)

Neal Ford "The Productive Programmer" talk (2008)

An interesting talk that still seems pretty relevant (although do people still use Subversion anymore, or is Git now the undisputed king of version control?). Anyway, something I feel like would be worth watching again at some point, so I'm enshrining a link here. Note that it's divided up into 7 different videos, with transitions at fairly random points which can be a bit distracting.

Monday, June 4, 2018

kilo commentary:

Steps 80-84 of the kilo construction process have to do with rendering tab characters as spaces on the screen. Part of this is converting tab characters to the equivalent number of spaces, which is done by this bit of code:

      row->render[idx++] = ' ';
      while (idx % KILO_TAB_STOP != 0) row->render[idx++] = ' ';

Which... this is equivalent to this, right?

      do
          row->render[idx++] = ' ';
      while (idx % KILO_TAB_STOP != 0);

I mean, I realize that the do-while construct is relatively uncommon to use in C (at least in my experience), but this is a perfect example of where to use it.

Monday, May 28, 2018

kilo: revisiting the PAGE_UP and PAGE_DOWN keys

Back when I was implementing The PAGE_UP and PAGE_DOWN keys, I offered my own refactored implementation.

I've now just completed the steps to implement vertical scrolling in the text viewer phase of the project, I discover that my implementation no longer has the same effect once there is a file to be scrolled through. Fortunately, there is an easy fix:

    case PAGE_UP: 
        editor_move_cursor(0, -get_screen_height());
        break;
 
    case PAGE_DOWN:
        editor_move_cursor(0, get_screen_height());
        break;

As a side note, I found getting the scrolling code correct was surprisingly fiddly, due at least partially to my slightly different structure and very different naming conventions. I feel like there's some major structural refactoring that needs to be done, but I feel I'm already close to a place where many more changes will get me to the place where I can't map between the original version and my refactored version anymore. So I'm going to hold off on making any big changes at least until I work through the rest of the code.

Sunday, May 20, 2018

kilo: refactoring access to g_editor_state

I've been continuing working through the steps to implement kilo, but I've been increasingly unhappy with the accesses to the global g_editor_state variable all over the code; it's ugly and makes too much of the code dependent on the precise declaration of that struct.

So I decided to refactor so that all accesses to the editor state are through aptly named functions. I'm a lot happier with the result. As an example, here's the current implementation of editor_draw_rows:

void editor_draw_rows(term_buffer* tb)
{
    for (int y = 0; y < get_screen_height(); y++)
    {
        if (y < get_file_lines())
        {
            int len = get_line_size();
            if (len > get_screen_width())
              len = get_screen_width();
            tb_append(tb, get_line_chars(), len);
        }
        else if (y != get_screen_height()/3)
            tb_append_str(tb, get_tilde_str());
        else if (get_file_lines() == 0)
            append_welcome_message(tb);
        
        tb_append_str(tb, get_clear_row_str());
        if (y + 1 < get_screen_height())
            tb_append_str(tb, get_rn_str());
    }
}

Monday, May 14, 2018

kilo commentary: Page Up and Page Down buttons

Oh, dear. Way back when I started this commentary, I said I didn't want to be critical, but how the Page Up and Page Down buttons are implemented is pretty ugly:

    case PAGE_UP:
    case PAGE_DOWN:
      {
        int times = E.screenrows;
        while (times--)
          editorMoveCursor(c == PAGE_UP ? ARROW_UP : ARROW_DOWN);
      }
      break;

Particularly when the Home and End keys are implemented more like I would expect:

    case HOME_KEY:
      E.cx = 0;
      break;
    case END_KEY:
      E.cx = E.screencols - 1;
      break;

My code:

    case PAGE_UP: 
        editor_set_cursor_row(0);
        break;
 
    case PAGE_DOWN:
        editor_set_cursor_row(g_editor_state.screenrows-1);
        break;

The code for editorReadKey (the final version, at least for this chapter, is here) has also become pretty ugly. Even in this relatively short implementation, all these nested ifs and elses make it very hard to see how the control logic works.

That said, I don't want to take time right now to work out a cleaner solution than the one I have right now (I'm more interested in the other parts of this code) so I'm going to leave this the way I have it. Probably the best answer is to break down and use a third-party library that has already solved this problem.

Sunday, May 13, 2018

kilo commentary: Moving the cursor (steps 43-45)

In spite of a hiatus that's now more than a year long, I haven't forgotten about my kilo project. Steps 43-45 involve code to keep track of where the cursor is on the screen, and allow the user to move it around. Here's the original code that does that:

void editorMoveCursor(char key) {
  switch (key) {
    case 'a':
      E.cx--;
      break;
    case 'd':
      E.cx++;
      break;
    case 'w':
      E.cy--;
      break;
    case 's':
      E.cy++;
      break;
  }
}
void editorProcessKeypress() {
  char c = editorReadKey();
  switch (c) {
    case CTRL_KEY('q'):
      write(STDOUT_FILENO, "\x1b[2J", 4);
      write(STDOUT_FILENO, "\x1b[H", 3);
      exit(0);
      break;
    case 'w':
    case 's':
    case 'a':
    case 'd':
      editorMoveCursor(c);
      break;
  }
}

The code is basically decoding the w, s, a, and d characters twice: once to determine that the editorMoveCursor function should be called, and another to determine how the cursor should be moved. This is (IMO) another bug waiting to happen, in a large project these two points in the code could easily get out of sync. There's no reason not to do something like this (which is also my current solution):

void editor_move_cursor(int dcx, int dcy)
{
    g_editor_state.cx += dcx;
    g_editor_state.cy += dcy;
}


void editor_process_keypress(char c)
{
    string_const clear_screen = get_clear_screen_str();
    string_const home_cursor  = get_home_cursor_str();

    switch (c)
    {
    case CTRL_KEY('q'):
        write_str(clear_screen);
        write_str(home_cursor);
        exit(0);
        break;

    case 'w': editor_move_cursor( 0, -1); break;
    case 's': editor_move_cursor( 0,  1); break;
    case 'a': editor_move_cursor(-1,  0); break;
    case 'd': editor_move_cursor( 1,  0); break;
    }
}

Note that now only editor_process_keypress needs to know what keys move the cursor, and editor_move_cursor gets parameters that are directly interpretable.

Thursday, February 22, 2018

The interface to JavaScript's Date object is kinda broken

Here's an amusing thing to type into your favorite JavaScript console:

new Date('2018-01-01').getFullYear()

The response I get is 2017, which is surprising, at best.

What's happening?

The first part of the problem is that the Date object is actually a date and time object, and so new Date('2018-01-01') creates an object that has the date and time of January 1st, 2018 00:00 GMT. Which in and of itself is a reasonable enough choice.

The part that leads to the surprising result is that .getFullYear() returns a localized result, and since I am in a GMT+7 time zone, January 1st, 2018 00:00 GMT is actually December 31st, 2017 17:00 local time, and so .getFullYear() tells me it's still 2017.

Of course, the Date interface is broken in other ways too (.getYear() is actually non-Y2K compliant, in spite of the fact that it was presumably first designed and implemented less than a decade prior to the year 2000).

Can anything be done?

It turns out that dealing with time and time zones is actually surprisingly complex, and given JavaScript's "colorful" history, it's not surprising that the Date interface has its share of quirks. Really, I just wish there as an object for dealing with dates alone, without an accompanying time. Date would be a fine name for that object, if it weren't already taken.

I'm in a situation where I want to be able to deal with dates in JavaScript, and it seems like my choices are either to use the Date object with its rather non-intuitive interface, or roll my own date-only object. Neither choice seems optimal, but I think rolling my own date-only object would add more inertia to my projects than I'm willing to take on right now.