by acls us

Parsing TFL Tube Line Status with Pentaho

TFL Pentaho In a previous article I wrong about using TRL Tube Line Status via the old API (rather than the new Unified API), formatting and storing the data in Zabbix to be displayed on a Grafana Dashboard. In this article, I'm using the new TFL Unified API to grab TFL Tube Line Status in a JSON structure and parsing that data using Pentaho to produce a simple CSV file with a Date/Time, a Tube Line Name and a Status of the Tube Line.

Having signed up for access to the new TFL Unified API and having had a look at the example RESTful web calls for data, I fired up an example query to get the TFL Tube Line Statuses.

The first element of the JSON Array returned looks like this;

"$type": "Tfl.Api.Presentation.Entities.Line, Tfl.Api.Presentation.Entities",
"id": "bakerloo",
"name": "Bakerloo",
"modeName": "tube",
"disruptions": [],
"created": "2018-06-12T17:24:37.747Z",
"modified": "2018-06-12T17:24:37.747Z",
"$type": "Tfl.Api.Presentation.Entities.LineStatus, Tfl.Api.Presentation.Entities",
"id": 0,
"statusSeverity": 10,
"statusSeverityDescription": "Good Service",
"created": "0001-01-01T00:00:00",
"validityPeriods": []
"routeSections": [],
"$type": "Tfl.Api.Presentation.Entities.LineServiceTypeInfo, Tfl.Api.Presentation.Entities",
"name": "Regular",
"uri": "/Line/Route?ids=Bakerloo&serviceTypes=Regular"
"$type": "Tfl.Api.Presentation.Entities.Crowding, Tfl.Api.Presentation.Entities"

However, all I wanted was the TFL Tube Line Name "name": "Bakerloo" and the Status "statusSeverityDescription": "Good Service" (marked in yellow above). The following Pentaho Transformation does the work required


The step "Generate Rows" is just used to set the API URL to be used by the "HTTP Client" step which send the RESTful API request to the TRL Servers. The returned JSON data is passed to the "JSON Input" step which parses out the Line Name and Line Status data. The "Get System Info" step just added the current Date/Time that is sent with the parsed data to the "Text file output" step that write the CSV file.

The only "tricky" bit is parsing the returned JSON structure. This uses JSONPath definitions to tell the JSON parser step where to find the data in the JSON structure.

For the Line Name the JSONPath was "$.[*].name" which instructs the JSON parser to get the "name" from each top level array element. For the Line Status the JSONPath was "$.[*].lineStatuses[0].statusSeverityDescription" which instructs the JSON parser to get the "statusSeverityDescription" from the first element of the "lineStatuses" array from each top level array element.

The results being...

2018/06/22 15:12:41,"Bakerloo","Good Service"
2018/06/22 15:12:41,"Central","Good Service"
2018/06/22 15:12:41,"Circle","Good Service"
2018/06/22 15:12:41,"District","Good Service"
2018/06/22 15:12:41,"Hammersmith & City","Good Service"
2018/06/22 15:12:41,"Jubilee","Good Service"
2018/06/22 15:12:41,"Metropolitan","Good Service"
2018/06/22 15:12:41,"Northern","Good Service"
2018/06/22 15:12:41,"Piccadilly","Good Service"
2018/06/22 15:12:41,"Victoria","Good Service"
2018/06/22 15:12:41,"Waterloo & City","Good Service"