Checked Number

Purpose

To extract a value from the value dictionary and clamp it within a specified range. If the value is not numeric or missing, there is an option to either throw an error or return a specified default value.

Type Name

‘checked_number’

Definition Syntax

‘[value_key]:[min_value]:[max_value]: [default] : [throw_error]’

where:

  • ‘value_key’ denotes the key mapping to the value in the value dictionary.
  • ‘min_value’ is the minimum allowed. If the value mapping to the ‘value_key’ in the value dictionary is less than this value, the returned value will be this specified minimum.
  • ‘max_value’ is the maximum allowed. If the value mapping to the ‘value_key’ in the value dictionary is greater than this value, the returned value will be this specified maximum.
  • ‘default’ is a numeric value to return in the event that the value mapping to the ‘value_key’ in the value dictionary is missing or not a number (e.g. ‘NA’ or ‘N/A’ or ‘No Data’). This value will only be returned if ‘throw_error’ is set to ‘false’ (see below).
  • ‘throw_error’ - a ‘true’ or ‘false’ flag denoting whether or not to throw an error in the event that the value mapping to the ‘value_key’ in the value dictionary is missing or not a number (e.g. ‘NA’ or ‘N/A’ or ‘No Data’).

Example

‘par_rut : 0 : 30 : 4.5 : false’

This will take the value mapped to key ‘par_rut’ in the value dictionary and - if the value is numeric - check if the value is in the range 0 to 30. If the value is outside this range, either the minimum (zero) or maximum (30) will be returned. If the value is within 0 to 30, the value will be returned ‘as is’.

The ‘false’ flag at the end of the setup string indicates that an error should not be thrown in the event that the value mapping to key ‘par_rut’ is non-numeric. In this case, the default value of 4.5 will be returned.

Important

It is recommended that you generally set the value for parameter ‘throw_error’ to ‘true’ unless you expect that your raw data may contain many missing or invalid values. This will ensure that you are explicitly made aware when certain values are missing or invalid (e.g. ‘n/a’ or ‘no data’) instead of having these values silently replaced with defaults.

This decision, however, depends on how you design the pre-processing pipeline for preparing your model inputs. It could be argued that it is better to do pre-processing in such a way that the model will not have to handle any missing or in-appropriate values. You can, for example, use a machine learning technique to impute missing values in a pre-processing script, and this way also get a detailed report of exactly how many values are missing, and which columns have missing values.

However, in some cases you may need this JFunction to take care of exceptional cases where you cannot streamline pre-processing to impute missing values. In such cases, this JFunction can be useful to ensure values are checked and trimmed during the model Initialisation stage.