Design of predictive soft sensors
Often in manufacturing environments, there is a need to find a way to estimate the value of a hard to measure property by using data sources which are more easily accessible. Direct measurement of the desired property may be difficult or impossible for a number of reasons, for example:
- A chemical concentration measurement may involve grab samples with a corresponding lab analysis delay that is undesirable.
- A measuring instrument may be intrinsically problematic and subject to operational problems like drift or failure.
Under these circumstances, an inferential model is attractive because it can provide an instantaneous and more reliable result. In addition, it's an independent check of the direct measurement system.
Our client, a company from the chemical industry, was struggling with an important, but hard-to-measure quality metric.
The solution: building a soft sensor
We looked at a list of measurements from a distillation column with a goal to estimate the value of the metric. In the ideal world, we would run a designed experiment covering the parameter space to collect a concise and meaningful data set to supply to our modeling process.
In the real world, we are often faced with the scenario of an operating and money-producing plant and forced to use what is available in the modeling process.
Distillation column is one of the methods of isolating the mixture coming from the reactor into various purified components. The (hot gaseous) input stream is fed into the bottom and on the way to the top goes through a series of trays having successively cooler temperatures. The temperature at the top is the coolest. Along the way, different components will condense at different temperatures and be isolated (with some statistical distribution on the actual components). With vapors rising and liquids falling through the column, purified fractions (different chemical compounds) can be retrieved from the various trays. The distillation column is very important for the chemical industry because it allows continuous operation as opposed to a batch process and is relatively efficient.
The data from the case contained a mixture of flows, pressures and temperatures in addition to the quality metric and (calculated) material balance - 24 properties in total with 7000 measurements.
We iteratively modelled the target quality variable, while focusing the exploration of driving properties on smaller and smaller sets. In only several rounds of modeling we robustly identified a handful of properties, impacting the quality of the distillation column output. The robust ensembles containing only these driver variables showed a satisfactory predictive capability on the test data falling into the required range of 0.05-0.07 ppm (4 to 6 percent of standard deviation on all test data samples).
The impact we created
These ensembles are explicit mathematical formulae, which can be directly deployed for on-line process monitoring on a plant without requiring any additional infrastructure. The driver variables in the ensembles provide additional focus for process engineers in identifying what impacts quality and how, as well as offers further opportunities in optimizing the process by improving the yield and reducing quality deviations.