1) Keep a rectangle updated per-screen rather than regenerate each time 2) Strip palette info when putting pixels into rectangles rather than during scaling 3) Tighten up the screen locks a bit 4) Don't require a full resend of both screens on an update request 5) Only force a redraw for cursor movement when the cursor is visible (And force it whenever the cursor changes) 6) Avoid doubles in interpolation 7) Heavily optimize interpolate_height() interpolate_width() likely doesn't need it because it's generally not used and also it reads from the next pixel in memory making the prefetchers job easier. 8) Fix some memory-leak-on-error issues 9) For ARGB8 XImages, manipulate the data directly rather than through XPutPixel() At this point, the scaling and X11 output time is heavily dominated by cache misses. The only really effective way to reduce this hit is to spread the work across all the L3 caches in the system or move it into the GPU. With the latest updates, at the SyncTERM menu, over 90% of the time is spent in the rendering pipeline, and over 90% of that time is spent thrashing the caches... the only real easy win left is vectorizing, but that's highly compiler specific. To that end, I've switched to -O3 for release builds. There was a comment that -finline-functions broke Baja "badly", but that's clearly false since -f-inline-functions has been part of -O2 for quite a while now, and Baja doesn't seem any more broken that it ever was.