Published at
Updated at
Reading time
1min

You might know that I'm running a Twitter bot called @randomMDN. Every few hours, the bot fetches the sitemap of MDN and tweets a random page.

It was running without a problem for two years, but recently it broke. The reason was that MDN changed the sitemap from https://developer.mozilla.org/sitemaps/en-US/sitemap.xml to https://developer.mozilla.org/sitemaps/en-US/sitemap.xml.gz. It's now a gzipped file.

It took me a while to figure out how to handle this new file format. For future reference, here's a snippet that shows the unzipping in Node.js.

The snippet uses got to make HTTP requests and node-gzip to fetch the gzipped sitemap and transform it to a string.

const got = require('got');
const { ungzip } = require('node-gzip');

const SITEMAP_URL =
  'https://developer.mozilla.org/sitemaps/en-US/sitemap.xml.gz';

// fetch file
const { body } = await got(SITEMAP_URL, {
    responseType: 'buffer',
  });
  
// unzip the buffered gzipped sitemap
const sitemap = (await ungzip(body)).toString();

Maybe that helps someone in the future. 🙈 Have fun!

Related Topics

Related Articles