benchmarking XML processing

Sat, Jan 17, 2026 2-minute read

I do a lot of XML processing at work and the performance of different native libraries was bothering me - some were way too slow for what should be a simple task.

So I started playing around with different languages, libraries, and parsing models (DOM vs. SAX) to see what actually makes a difference.

Meet xml-i (pronounced “XML eye”) - a CLI tool that takes an XML file, counts how often each node appears, optionally filtered by name.

The baseline is written in Rust using quick-xml and it’s consistently the fastest of the bunch. But the alien/ directory is where it gets interesting - C++, Java, Scala, Julia, .NET, PowerShell, Python, …

I also threw in “(noxml)” tests - text-only, non-validating parsers that strip the XML structure and just count raw text. It’s significantly faster, but without proper validation it’s useless in real-world applications. It proves that if you cut corners you can be fast, but that doesn’t help when you actually need to process valid XML 😉

The benchmark results are the best part. On a 3.2 GB file:

  • Rust (quick-xml) finishes in 2.8 seconds using ~2 MB of memory
  • C++ (pugixml) does it in 4.5 seconds but chews through 7.6 GB of RAM
  • Python takes 52 seconds and peaks at 12.6 GB
  • PowerShell Core sits at 9.3 seconds with 130 MB - not bad for a scripting language

Takeaway: our computers are inredibly fast, if it’s slow, you’re most likely doing it wrong.

Check it out at github.com/mwallner/xml-i - PRs with new languages are always welcome 😉

~ till next time