Input Data

Overview

This page outlines the required structure for the model input data. The model input set is a .CSV file containing specific column names with their data for all network elements. The preparation of the model input file is a vital task in the modelling pipeline. You should take all reasonable care to ensure that the input data is up-to-date and accurate.

Typically, the preparation of the Model Input data can be done through a combination of Data Joins within your Asset Management System. The JunoAMS, for example, has a sophisticated data join feature that allows you to create Join Parameters that tie certain columns in your database to an output set. For more information on JunoAMS’s data join feature see this link.

Input Set Preparation

Regardless of what tools you use to create the basis for your input data, you may need to make project specific adjustments of certain flags and parameters in the input set based on the specific policies of the network you are working on.

For example, the input data contains several TRUE/FALSE flags, such as “file_can_rehab_flag” which indicates whether an element can be considered for treatment or not. You can set a default value to TRUE to all elements for such flags, but this should then be updated using a script or spreadsheet formula to apply client policies. For example, a client may have a policy not to consider rehabilitation on roads with an Average Daily Traffic (ADT) volume of less than 120 vehicles per day. In such a case, you should apply a script or Excel formula to update the “file_can_rehab_flag” column values based on the values in “file_adt”.

Another aspect of data preparation you should handle in the preparation of your input data set is how to handle missing data. The Juno Cassandra model will throw an error if any columns contain missing data, or if numeric columns contain non-numeric data.

You should therefore include logic in your pre-processing steps to ensure that you assign reasonable defaults to cells that have missing data. You can use imputation algorithms such as those available in the R-language to make educated guesses about missing data, or you can keep it simple and assign averages (or modes in the case of text data) based on ONRC category etc.

Note

While it is possible to encapsulate some client policies (as well as rules for handling missing values) in the Domain Model itself, we have deliberately chosen to put client-specific policy related logic, and rules for assigning default values, in the input preparation stage rather than in then model itself.

This not only simplifies the model logic significantly, but also gives the modeller the freedom to use their own tools and skills to prepare an input set that fully encapsulates client policies and preferences related to default values regardless of how complex these rules may be.

Input Data Required Columns

The table below lists all of the required columns in the model input set. Note that the model input logic considers column names to be case-sensitive. Thus you should ensure that your input data contains the exact column names as given below.

category column_name data_type example comment
idenfification file_seg_name text Segment32 Segment identifier
idenfification file_section_id number 724 SectionID
idenfification file_section_name text BIRDNAME ROAD Name of Section
idenfification file_loc_from number 430 Start metre
idenfification file_loc_to number 1445 End metre
idenfification file_lane_name text All lane code
quantity file_length number 1015 Length of segment in metres
quantity file_area_m2 number 10332 Square metre area
trigger file_can_treat_flag text TRUE Can this segment be considered for treatment (change as needed based on client policy)
trigger file_can_rehab_flag text FALSE Can this segment be considered for Rehab (client specific change as needed based on client policy)
trigger file_ac_ok_flag text TRUE Is the pavement suitable for asphalt resurfacing (based on current delfection/remaining pavement life/condition - subject to client policy/thresholds)
surfacing file_surf_class text cs Surface Class (must be one of: 'cs' or 'ac', 'blocks','concrete','other')
trigger file_next_surf text ac What is the replacement surfacing type ('cs' or 'ac', 'blocks','concrete','other') (change as needed based on client policy)
trigger file_earliest_treat_period number 1 Specify the ealiest possible modelling period that the first treatment may be triggered (flag for fine control of treatment selection on certain elements)
road file_urban_rural text U Urban/Rural Tag
road file_onrc text secondary collector ONRC Category
traffic file_adt number 2463 Average Daily Traffic
traffic file_heavy_perc number 5 Heavy Vehicle Percentage
traffic file_no_of_bus_routes number 1 Number of Bus Routes - Can be used in MCDA Model to lend greater weight to roads with more bus routes
traffic file_traff_growth_perc number 2 Traffic Growth Percent
surfacing file_surf_date text 13/02/2003 Surfacing Date dd/mm/yyyy (will be use to determine Surfacing Age using 'base_date' value in 'General' Lookup set)
surfacing file_surf_function text 2 Surface Function
surfacing file_surf_material text RACK Surfacing Material
surfacing file_surf_life_expected number 10 Surfacing Expected life from RAMM. Important factor that determines surface remaining life and plays a role in S-Curve factors for distresses
surfacing file_surf_layer_no number 1 Surfacing layer number
surfacing file_surf_thick number 9 Surfacing thickness, in mm
pavement file_pave_date text 8/02/1976 Pavement Construction Date dd/mm/yyyy (will be use to determine Pavement Age using 'base_date' value in 'General' Lookup set)
pavement file_pave_remlife number 20 Age based pavement remaining life
maint_fault file_su_fault_qty number 0 Surfacing Faults in Square Metres (open dispatches) -plays a role in calculation of Surfacing Distress Index
maint_fault file_pa_fault_qty number 0 Pavementy Faults in Square Metres (open dispatches) -plays a role in calculation of Pavement Distress Index
hsd file_rough_survey_date text 27/03/2023 Roughness Survey Date dd/mm/yyyy (use to determine Survey Age using 'base_date' value in 'General' Lookup set). Used in turn to determine if survey is outdated.
hsd file_naasra_85 number 110 Naasra 85th Percentile
hsd file_rut_survey_date text 27/03/2023 Rut Survey Date dd/mm/yyyy (use to determine Survey Age using 'base_date' value in 'General' Lookup set). Used in turn to determine if survey is outdated.
hsd file_rut_lwpmean_85 number 4 LWP Mean Rut 85th percentile
hsd file_rut_rwpmean_85 number 4 RWP Mean Rut 85th percentile
distress file_cond_survey_date text 18/03/2023 Condition survey Date dd/mm/yyyy (use to determine Survey Age using 'base_date' value in 'General' Lookup set). Used in turn to determine if survey is outdated.
distress file_pct_allig number 0 Alligator/Mesh Cracks percent of segment length
distress file_pct_lt_crax number 1.06 L&T Cracks percent of segment length
distress file_pct_poth number 4.3E-3 Potholes percent of segment length
distress file_pct_scabb number 0.15 Scabbing percent of segment length
distress file_pct_flush number 0 Flushing percent of segment length
distress file_pct_shove number 0 Shoving percent of segment length
distress file_pct_edgebreak number 0 Edge Breaks percent of segment length
Important

The Cassandra Framework Model will throw an error if any cells in your input set are empty or does not contain the correct data type. Specifically, numeric columns should not contain any empty values or text values that cannot be converted to numbers.