Input Data

Overview

This page outlines the required structure for the model input data. The model input set is a .CSV file containing specific column names with their data for all network elements. The preparation of the model input file is a vital task in the modelling pipeline. You should take all reasonable care to ensure that the input data is up-to-date and accurate.

Typically, the preparation of the Model Input data can be done through a combination of Data Joins within your Asset Management System. The JunoAMS, for example, has a sophisticated data join feature that allows you to create Join Parameters that tie certain columns in your database to an output set. For more information on JunoAMS’s data join feature see this link.

Input Set Preparation

Regardless of what tools you use to create the basis for your input data, you may need to make project specific adjustments of certain flags and parameters in the input set based on the specific policies of the network you are working on.

For example, the input data contains several TRUE/FALSE flags, such as “file_can_rehab_flag” which indicates whether an element can be considered for treatment or not. You can set a default value to TRUE to all elements for such flags, but this should then be updated using a script or spreadsheet formula to apply client policies. For example, a client may have a policy not to consider rehabilitation on roads with an Average Daily Traffic (ADT) volume of less than 120 vehicles per day. In such a case, you should apply a script or Excel formula to update the “file_can_rehab_flag” column values based on the values in “file_adt”.

Another aspect of data preparation you should handle in the preparation of your input data set is how to handle missing data. The Juno Cassandra model will throw an error if any columns contain missing data, or if numeric columns contain non-numeric data.

You should therefore include logic in your pre-processing steps to ensure that you assign reasonable defaults to cells that have missing data. You can use imputation algorithms such as those available in the R-language to make educated guesses about missing data, or you can keep it simple and assign averages (or modes in the case of text data) based on ONRC category etc.

Note

While it is possible to encapsulate some client policies (as well as rules for handling missing values) in the Domain Model itself, we have deliberately chosen to put client-specific policy related logic, and rules for assigning default values, in the input preparation stage rather than in then model itself.

This not only simplifies the model logic significantly, but also gives the modeller the freedom to use their own tools and skills to prepare an input set that fully encapsulates client policies and preferences related to default values regardless of how complex these rules may be.

Input Data Required Columns

The table below lists all of the required columns in the model input set. Note that the model input logic considers column names to be case-sensitive. Thus you should ensure that your input data contains the exact column names as given below.

category	column_name	data_type	example	comment
idenfification	file_seg_name	text	Segment32	Segment identifier
idenfification	file_section_id	number	724	SectionID
idenfification	file_section_name	text	BIRDNAME ROAD	Name of Section
idenfification	file_loc_from	number	430	Start metre
idenfification	file_loc_to	number	1445	End metre
idenfification	file_lane_name	text	All	lane code
quantity	file_length	number	1015	Length of segment in metres
quantity	file_area_m2	number	10332	Square metre area
trigger	file_can_treat_flag	text	TRUE	Can this segment be considered for treatment (change as needed based on client policy)
trigger	file_can_rehab_flag	text	FALSE	Can this segment be considered for Rehab (client specific change as needed based on client policy)
trigger	file_ac_ok_flag	text	TRUE	Is the pavement suitable for asphalt resurfacing (based on current delfection/remaining pavement life/condition - subject to client policy/thresholds)
surfacing	file_surf_class	text	cs	Surface Class (must be one of: 'cs' or 'ac', 'blocks','concrete','other')
trigger	file_next_surf	text	ac	What is the replacement surfacing type ('cs' or 'ac', 'blocks','concrete','other') (change as needed based on client policy)
trigger	file_earliest_treat_period	number	1	Specify the ealiest possible modelling period that the first treatment may be triggered (flag for fine control of treatment selection on certain elements)
road	file_urban_rural	text	U	Urban/Rural Tag
road	file_onrc	text	secondary collector	ONRC Category
traffic	file_adt	number	2463	Average Daily Traffic
traffic	file_heavy_perc	number	5	Heavy Vehicle Percentage
traffic	file_no_of_bus_routes	number	1	Number of Bus Routes - Can be used in MCDA Model to lend greater weight to roads with more bus routes
traffic	file_traff_growth_perc	number	2	Traffic Growth Percent
surfacing	file_surf_date	text	13/02/2003	Surfacing Date dd/mm/yyyy (will be use to determine Surfacing Age using 'base_date' value in 'General' Lookup set)
surfacing	file_surf_function	text	2	Surface Function
surfacing	file_surf_material	text	RACK	Surfacing Material
surfacing	file_surf_life_expected	number	10	Surfacing Expected life from RAMM. Important factor that determines surface remaining life and plays a role in S-Curve factors for distresses
surfacing	file_surf_layer_no	number	1	Surfacing layer number
surfacing	file_surf_thick	number	9	Surfacing thickness, in mm
pavement	file_pave_date	text	8/02/1976	Pavement Construction Date dd/mm/yyyy (will be use to determine Pavement Age using 'base_date' value in 'General' Lookup set)
pavement	file_pave_remlife	number	20	Age based pavement remaining life
maint_fault	file_su_fault_qty	number	0	Surfacing Faults in Square Metres (open dispatches) -plays a role in calculation of Surfacing Distress Index
maint_fault	file_pa_fault_qty	number	0	Pavementy Faults in Square Metres (open dispatches) -plays a role in calculation of Pavement Distress Index
hsd	file_rough_survey_date	text	27/03/2023	Roughness Survey Date dd/mm/yyyy (use to determine Survey Age using 'base_date' value in 'General' Lookup set). Used in turn to determine if survey is outdated.
hsd	file_naasra_85	number	110	Naasra 85th Percentile
hsd	file_rut_survey_date	text	27/03/2023	Rut Survey Date dd/mm/yyyy (use to determine Survey Age using 'base_date' value in 'General' Lookup set). Used in turn to determine if survey is outdated.
hsd	file_rut_lwpmean_85	number	4	LWP Mean Rut 85th percentile
hsd	file_rut_rwpmean_85	number	4	RWP Mean Rut 85th percentile
distress	file_cond_survey_date	text	18/03/2023	Condition survey Date dd/mm/yyyy (use to determine Survey Age using 'base_date' value in 'General' Lookup set). Used in turn to determine if survey is outdated.
distress	file_pct_allig	number	0	Alligator/Mesh Cracks percent of segment length
distress	file_pct_lt_crax	number	1.06	L&T Cracks percent of segment length
distress	file_pct_poth	number	4.3E-3	Potholes percent of segment length
distress	file_pct_scabb	number	0.15	Scabbing percent of segment length
distress	file_pct_flush	number	0	Flushing percent of segment length
distress	file_pct_shove	number	0	Shoving percent of segment length
distress	file_pct_edgebreak	number	0	Edge Breaks percent of segment length

Important

The Cassandra Framework Model will throw an error if any cells in your input set are empty or does not contain the correct data type. Specifically, numeric columns should not contain any empty values or text values that cannot be converted to numbers.