TVS Part 2: Why? Tiptoe into Unicode

Jeff Marrison

In this video I continue the question "Why would anyone write Assembly Language?" by analysing UTF-8 in the multistream Japanese Wikipedia XML dump. (If that link stops working, search for jawiki on the backup index page)

Uncompressed, this file is around 12GB and is stored in UTF-8. In this video, I compare Assembly Language, C - clang, C - gcc, NodeJS, Ruby and Python 3.

Download working directory

An archive of everything I produced during this video: part2.tar.gz. Note: I included a download script so you can also grab the Japanese Wikipedia XML (not included in my part2.tar.gz).

Learn more about the basics of assembly language

Tomasz Grysztar, author of flat assembler has created an excellent series on Assembly Language for x86 on his Youtube Channel.