Speed of UTF-8 versus C#'s pseudo-UTF-16?
I used to love UTF-8. I still do to an extent because it's such an elegant hack, and is perfect for storage and the web due to its compact encoding (most of the time). However, something made me think twice the other day when I made a program
to find/replace text using Regex as a middleman.
To my astonishment, I found that find/replace in my program approached 10x faster than Notepad++ when I wanted to change semicolons to commas in a CSV file (text file here
- 4M uncompressed, 200K compressed). I then proceeded to test other text editors such as UltraEdit, Sublime, TextPad, and EditPlus 2. My simple program (which uses the Regex function in C#) beats them all. Here are the stats. I tested a few times on a Windows 7 system, but they are pretty approximate and may vary from system to system. Still:
- UltraEdit - 8s
- EditPlus - 3-20s (fast at first, then progressively slower for each test afterwards)
- Notepad++ - 15s
- Textpad - 21s
- Notepad - 3s (wow fast for Notepad!)
- Sublime Text 2 - 30 minutes (!)
- My C# program - 2s
I was wondering what might have caused the discrepancy in speed. Then I was reading about how Notepad++ uses UTF-8 internally which means it doesn't use a fixed width character length. C#/.NET on the other hand does (it uses a modified version of UTF-16
which is fixed-width), and presumably its Regex implementation also uses that. Could this account for the massive variation in find/replace speed?
Or maybe Microsoft's Regex implementation is just super fast compared to the rest? In any case, it REALLY helps find/replacing for files over say 10 meg in size, let alone gigabyte-sized files.
submitted by twinbee
Will UltraEdit ever consider making a version for Chrome OS?
I have used the Linux version for years, but have moved to Chromebooks since they can do anything except Photoshop/Illustrator. Which makes me wish that the greatest text editor EVER would consider a version for Chrome OS.
submitted by zonk3