The majority of online content is written in languages other than English, and is most commonly encoded in UTF-8, the world’s dominant Unicode character encoding. Traditional compression algorithms typically operate on individual bytes. While this approach works well for the single-byte ASCII encoding, it works poorly for UTF-8, where characters often span multiple bytes. Our paper introduces a technique to modify byte-by-byte compressors to operate directly on Unicode characters. We demonstrate this technique applied to LZW and PPM, finding our variant substantially outperforms the original unmodified compressors.