Beijing Air Pollution Statistics

Welcome to Luftstat!
The purpose of this website is to present analyzed historical Beijing air pollution data. "Luftstat" is a made-up word for the purpose of identifying this website. "Luft" means 'air' in German, Swedish, Danish, and Norwegian. "Stat" refers to statistics.
If you have questions, suggestions, or believe that you have found a mistake or error, don't hesitate to contact me at the email address further down on the page! It's much appreciated.

Instructions for the main page

By marking desired checkboxes, you can as a website visitor control the number of diagrams to choose from in the list visible further down on the main page if visiting the page from a computer. (On a mobile phone, instead only a horizontally placed white field will show, and one has to instead first click on the white field, and only then a list shows up. For still unclear reasons, one must after selections in the list once again click on the white field to see the selected diagrams.)

The nature of the data, its access, and mathematics

The air pollution data itself comes from measurements conducted by the American embassy in Beijing. That data is continuously published on the Internet by the United States Department of State, every hour around the clock. (The data is also available in the form of compilations that can be downloaded.) Mentioned data can of course be statistically analyzed and presented in various ways. This the developer of Luftstat has done, and then with data for PM2.5 and AQI. "PM2.5" is the mass of particles with a diameter of up to 2.5 micrometers, and "AQI" is an Air Quality Index which is a weighted compilation of the pollution sulfur dioxide, nitrogen dioxide, carbon monoxide, ozon, and particles.
The most advanced part of the analysis was the development of -- for each suitable data segmentation over time -- the mathematical function for the distribution of the PM2.5 values across its various measurement values; that is an equation that describes how often a certain level of PM2.5 appears in relation to other levels of PM2.5. This was accomplished by developing an optimization program in the programming language Java, which with millions of comparisons finds the curve that minimizes the sum of the squares of the deviations from the same curve; that is a regression analysis for a non-linear curve.
The resulting functions aren't published on this website. However, several of their curves are, as well as the graphical result of splitting such a function into two log-normal distributions.

Valid values

Among the data in spreadsheets from the United States Department of State, for PM2.5 the column "Raw Conc." has been selected (and thereby not "NowCast Conc." or "Value"). The department has for every hour labeled the data as either "valid", "invalid", or "missing". In those cases where the data is labeled "invalid", Luftstat has followed that assessment and not included such data in its analyses. Furthermore, 0.1 percent negative PM2.5 values marked as valid occur. They are all just below zero, and the lowest value is -5 (µg/m3), which only appears one single time among 78041 valid data points from 2013 to and including 2021.
One may think that negative values would be impossible, and it's reasonably about deviations of sensitive instruments. Negative values could have been excluded before statistical analyses. However, the instrument should approximately as often as it shows somewhat too low values, show somewhat too high values. Therefore, even somewhat too low values should be included, in this case also negative values. This has been done (and then once again in line with the assessment by the United States Department of State since this only applies to values marked as valid).
Data from years before 2013 could have included, but from 2013 (and until 2021) the share of valid data is never less than 97.8 percent. At the same time, the share of valid data for 2012 is substantially lower at 94.4 percent, which explains why only data from 2013 and forward has been analyzed and presented.