Published at
Updated at
Reading time
1min

You might know that I'm running a Twitter bot called @randomMDN. Every few hours, the bot fetches the sitemap of MDN and tweets a random page.

It was running without a problem for two years, but recently it broke. The reason was that MDN changed the sitemap from https://developer.mozilla.org/sitemaps/en-US/sitemap.xml to https://developer.mozilla.org/sitemaps/en-US/sitemap.xml.gz. It's now a gzipped file.

It took me a while to figure out how to handle this new file format. For future reference, here's a snippet that shows the unzipping in Node.js.

The snippet uses got to make HTTP requests and node-gzip to fetch the gzipped sitemap and transform it to a string.

const got = require('got');
const { ungzip } = require('node-gzip');

const SITEMAP_URL =
  'https://developer.mozilla.org/sitemaps/en-US/sitemap.xml.gz';

// fetch file
const { body } = await got(SITEMAP_URL, {
    responseType: 'buffer',
  });
  
// unzip the buffered gzipped sitemap
const sitemap = (await ungzip(body)).toString();

Maybe that helps someone in the future. 🙈 Have fun!

Was this snippet helpful?
Yes? Cool! You might want to check out Web Weekly for more snippets. The last edition went out 12 days ago.
Stefan standing in the park in front of a green background

About Stefan Judis

Frontend nerd with over ten years of experience, freelance dev, "Today I Learned" blogger, conference speaker, and Open Source maintainer.

Related Topics

Related Articles