The visualization protocol consists of the following:

1. Timespan:

The data used was collected during the year 2018. We need to keep into account the fact that with the recent COVID-19 pandemic it may be possible that this data does not reflect anymore the current situation.

2. Data source:

The data was collected from the United States Department of Transportation, more precisely on the Bureau of Transportation Statistics section, it can be found at this link: Bureau of Transportation Statistics

3. Link to the dataset used:

The specific dataset was gathered from a Kaggle page containing a collection of Air Traffic Data with a span from 2009 to 2018, it can be found here: Airline Delay and Cancellation Data, 2009 - 2018

4. Metadata of the main dataset (every route related data derives from this):

FL_DATE (date of the flight),
OP_CARRIER (name of the carrier operator),
OP_CARRIER_FL_NUM (flight number of the carrier operator),
ORIGIN (origin airport IATA code),
DEST (destination airport IATA code),
DEP_TIME (time of departure),
DEP_DELAY (departure delay),
TAXI_OUT (time spent on the runway),
WHEELS_OFF (time when wheels are of the ground),
WHEELS_ON (time when wheels touch the ground),
TAXI_IN (time spent on the arrival runway),
ARR_TIME (time of arrival at the gate),
ARR_DELAY (delay on arrival),
CANCELLED (true or false),
CANCELLATION_CODE (determines the cancellation reason),
DIVERTED (true or false),
ACTUAL_ELAPSED_TIME (total time elapsed),
AIR_TIME (time in the air),
DISTANCE (In kilometres),
DELAY_REASON (carrier_delay, weather_delay, nas_delay, security_delay, late_aircraft_delay).

5. Short abstract of the data visualization process:

The dataset was processed using python, it didn’t require any pre-processing and alongside the Airports Dataset containing IATA code and coordinates available on the Bureau of Transportation Statistics site, for each of the visualizations we saved a new file containing only the necessary columns for the specific use case.

6. Actions performed to obtain each visualization:

An ordered list of all the actions performed and parameters and scripts used to transform the raw data into the final visualization is available in the MAP PROTOCOL section of each map/plot.

7. Data used to obtain this visualization:

The data used in this visualization regards only the fields about position of the airports, number of flights for each route and number of flights for each airport.

The visualization shown in the Flight Routes Map represents the amount of airline traffic with respect to each air route in the USA.


Its features and visual variables are:

Yellow dots (shape and colour): depict the airports locations. The bigger the point the higher its relevance (total number of flights).

Thickness of segments (size, orientation): the thicker the segment, the bigger the number of flights in that direction.

Colour of segment (colour): doesn't have a precise meaning. It is just a palette of colours that helps the user to understand the departure and arrival airport.

The goal of this visualization is to learn the distribution of the flights contemplated in our dataset.
There is the chance to select, by clicking on the map or writing the airport code/name in the designated field, a single airport in order to show the specific connections it has with other airports.

Zoom in on the map to discover smaller airports and connections.

Flight Routes

The visualization protocol consists of the following:

1. Timespan:

The data used was collected during the year 2018. We need to keep into account the fact that with the recent COVID-19 pandemic it may be possible that this data does not reflect anymore the current situation.

2. Data source:

The data was collected from the United States Department of Transportation, more precisely on the Bureau of Transportation Statistics section, it can be found at this link: Bureau of Transportation Statistics

3. Link to the dataset used:

The specific dataset was gathered from a Kaggle page containing a collection of Air Traffic Data with a span from 2009 to 2018, it can be found here: Airline Delay and Cancellation Data, 2009 - 2018

4. Metadata of the main dataset (every route related data derives from this):

FL_DATE (date of the flight),
OP_CARRIER (name of the carrier operator),
OP_CARRIER_FL_NUM (flight number of the carrier operator),
ORIGIN (origin airport IATA code),
DEST (destination airport IATA code),
DEP_TIME (time of departure),
DEP_DELAY (departure delay),
TAXI_OUT (time spent on the runway),
WHEELS_OFF (time when wheels are of the ground),
WHEELS_ON (time when wheels touch the ground),
TAXI_IN (time spent on the arrival runway),
ARR_TIME (time of arrival at the gate),
ARR_DELAY (delay on arrival),
CANCELLED (true or false),
CANCELLATION_CODE (determines the cancellation reason),
DIVERTED (true or false),
ACTUAL_ELAPSED_TIME (total time elapsed),
AIR_TIME (time in the air),
DISTANCE (In kilometres),
DELAY_REASON (carrier_delay, weather_delay, nas_delay, security_delay, late_aircraft_delay).

5. Short abstract of the data visualization process:

The dataset was processed using python, it didn’t require any pre-processing and alongside the Airports Dataset containing IATA code and coordinates available on the Bureau of Transportation Statistics site, for each of the visualizations we saved a new file containing only the necessary columns for the specific use case.

6. Actions performed to obtain each visualization:

An ordered list of all the actions performed and parameters and scripts used to transform the raw data into the final visualization is available in the MAP PROTOCOL section of each map/plot.

7. Data used to obtain this visualization:

The data used in this visualization regards only the fields about position of the airports, number of flights for each route, number of flights for each airport and sum of delay for each route.

The visualization shown in the Routes Delay Map represents the airline routes divided by airport as the first one, and their respective delay.


Its features and visual variables are:

Colour: the palette goes from green to red, where the first mean a low accrued delays on that route and the latter the opposite.

Orientation: clearly visualize the route direction.

This visualization acts as a mean to understand which routes are most often late, or respectively on time.

Zoom in on the map to discover smaller airports and connections.

In the next map, the same topic is presented with an insight on delay average for a route.

Routes Delay

The visualization protocol consists of the following:

1. Timespan:

The data used was collected during the year 2018. We need to keep into account the fact that with the recent COVID-19 pandemic it may be possible that this data does not reflect anymore the current situation.

2. Data source:

The data was collected from the United States Department of Transportation, more precisely on the Bureau of Transportation Statistics section, it can be found at this link: Bureau of Transportation Statistics

3. Link to the dataset used:

The specific dataset was gathered from a Kaggle page containing a collection of Air Traffic Data with a span from 2009 to 2018, it can be found here: Airline Delay and Cancellation Data, 2009 - 2018

4. Metadata of the main dataset (every route related data derives from this):

FL_DATE (date of the flight),
OP_CARRIER (name of the carrier operator),
OP_CARRIER_FL_NUM (flight number of the carrier operator),
ORIGIN (origin airport IATA code),
DEST (destination airport IATA code),
DEP_TIME (time of departure),
DEP_DELAY (departure delay),
TAXI_OUT (time spent on the runway),
WHEELS_OFF (time when wheels are of the ground),
WHEELS_ON (time when wheels touch the ground),
TAXI_IN (time spent on the arrival runway),
ARR_TIME (time of arrival at the gate),
ARR_DELAY (delay on arrival),
CANCELLED (true or false),
CANCELLATION_CODE (determines the cancellation reason),
DIVERTED (true or false),
ACTUAL_ELAPSED_TIME (total time elapsed),
AIR_TIME (time in the air),
DISTANCE (In kilometres),
DELAY_REASON (carrier_delay, weather_delay, nas_delay, security_delay, late_aircraft_delay).

5. Short abstract of the data visualization process:

The dataset was processed using python, it didn’t require any pre-processing and alongside the Airports Dataset containing IATA code and coordinates available on the Bureau of Transportation Statistics site, for each of the visualizations we saved a new file containing only the necessary columns for the specific use case.

6. Actions performed to obtain each visualization:

An ordered list of all the actions performed and parameters and scripts used to transform the raw data into the final visualization is available in the MAP PROTOCOL section of each map/plot.

7. Data used to obtain this visualization:

The data used in this visualization regards only the fields about position of the airports and average delay for each route (obtained dividing the sum of the delay over an entire year for each route and the number of flights that took that route).

The visualization shown in the Delay Insight Map represents the airline routes filtered by delay average.


Its features and visual variables are:

Trace Dimension: Bigger traces represent a high delay on a specific route.

Orientation: clearly visualize the route direction.

Tier List: Displays the 10 routes with worst average delay

This visualization acts as a mean to understand which routes are most often late, or respectively on time.

Zoom in on the map to discover smaller airports and connections;
Scroll the filter bar in order to select different delay ranges.

Through the use of this map we can selectively compare bad airports and routes against the user ratings present in the next map.

Delay Insight

The visualization protocol consists of the following:

1. Timespan:

The data used was collected during the year 2018. We need to keep into account the fact that with the recent COVID-19 pandemic it may be possible that this data does not reflect anymore the current situation.

2. Data source:

The data was collected from the United States Department of Transportation, more precisely on the Bureau of Transportation Statistics section, it can be found at this link: Bureau of Transportation Statistics

3. Link to the dataset used:

The specific dataset was gathered from a Kaggle page containing a collection of Air Traffic Data with a span from 2009 to 2018, it can be found here: Airline Delay and Cancellation Data, 2009 - 2018

4. Metadata of the main dataset (every route related data derives from this):

FL_DATE (date of the flight),
OP_CARRIER (name of the carrier operator),
OP_CARRIER_FL_NUM (flight number of the carrier operator),
ORIGIN (origin airport IATA code),
DEST (destination airport IATA code),
DEP_TIME (time of departure),
DEP_DELAY (departure delay),
TAXI_OUT (time spent on the runway),
WHEELS_OFF (time when wheels are of the ground),
WHEELS_ON (time when wheels touch the ground),
TAXI_IN (time spent on the arrival runway),
ARR_TIME (time of arrival at the gate),
ARR_DELAY (delay on arrival),
CANCELLED (true or false),
CANCELLATION_CODE (determines the cancellation reason),
DIVERTED (true or false),
ACTUAL_ELAPSED_TIME (total time elapsed),
AIR_TIME (time in the air),
DISTANCE (In kilometres),
DELAY_REASON (carrier_delay, weather_delay, nas_delay, security_delay, late_aircraft_delay).

5. Short abstract of the data visualization process:

The dataset was processed using python, it didn’t require any pre-processing and alongside the Airports Dataset containing IATA code and coordinates available on the Bureau of Transportation Statistics site, for each of the visualizations we saved a new file containing only the necessary columns for the specific use case.

6. Actions performed to obtain each visualization:

An ordered list of all the actions performed and parameters and scripts used to transform the raw data into the final visualization is available in the MAP PROTOCOL section of each map/plot.

7. Data used to obtain this visualization:

The data used in this visualization regards only the fields about position of the airports and user ratings for each airport (this data was obtained using a custom-made web scraping tool).
All the User Rating Data was found on: SkyTraxRatings.com, AirHelp.com and AirlineQuality.com

The visualization shown in the User Ratings Map represents the ratings given by users at a specific airport.


Its features and visual variables are:

Dot Dimension: Bigger dots denotes a Bad Rating in n/10 stars, smaller ones denotes a High rating instead.

Color: Red dots denotes a Bad Rating in n/10 stars, Green denotes a High rating instead.

This visualization acts as a mean to understand how users perceive services offered by a certain airport.

Zoom in on the map to discover smaller airports;
Hover with the mouse to read the airport name and the respective rating.

Through the use of this map we can selectively compare bad airports and routes against the delay insight present in the previous map.

User Ratings

The visualization protocol consists of the following:

1. Timespan:

The data used was collected during the year 2018. We need to keep into account the fact that with the recent COVID-19 pandemic it may be possible that this data does not reflect anymore the current situation.

2. Data source:

The data was collected from the United States Department of Transportation, more precisely on the Bureau of Transportation Statistics section, it can be found at this link: Bureau of Transportation Statistics

3. Link to the dataset used:

The specific dataset was gathered from a Kaggle page containing a collection of Air Traffic Data with a span from 2009 to 2018, it can be found here: Airline Delay and Cancellation Data, 2009 - 2018

4. Metadata of the main dataset (every route related data derives from this):

FL_DATE (date of the flight),
OP_CARRIER (name of the carrier operator),
OP_CARRIER_FL_NUM (flight number of the carrier operator),
ORIGIN (origin airport IATA code),
DEST (destination airport IATA code),
DEP_TIME (time of departure),
DEP_DELAY (departure delay),
TAXI_OUT (time spent on the runway),
WHEELS_OFF (time when wheels are of the ground),
WHEELS_ON (time when wheels touch the ground),
TAXI_IN (time spent on the arrival runway),
ARR_TIME (time of arrival at the gate),
ARR_DELAY (delay on arrival),
CANCELLED (true or false),
CANCELLATION_CODE (determines the cancellation reason),
DIVERTED (true or false),
ACTUAL_ELAPSED_TIME (total time elapsed),
AIR_TIME (time in the air),
DISTANCE (In kilometres),
DELAY_REASON (carrier_delay, weather_delay, nas_delay, security_delay, late_aircraft_delay).

5. Short abstract of the data visualization process:

The dataset was processed using python, it didn’t require any pre-processing and alongside the Airports Dataset containing IATA code and coordinates available on the Bureau of Transportation Statistics site, for each of the visualizations we saved a new file containing only the necessary columns for the specific use case.

6. Actions performed to obtain each visualization:

An ordered list of all the actions performed and parameters and scripts used to transform the raw data into the final visualization is available in the MAP PROTOCOL section of each map/plot.

7. Data used to obtain this visualization:

The data used in this visualization regards only the fields about average delay for each airport combined with the data about user ratings for each airport.

The visualization shown in this Correlation Scatter Plot represents the ratings given by users at a specific airport and the respective Average Airport delay


Its features and visual variables are:

Trend Line: It helps to understand the correlation between average delay and user ratings.

Hover with the mouse to read the airport name, the respective rating, and the average delay.

Through the use of this plot it is clearly visible that there is almost a linear correlation between user ratings and average delay in minutes.
This plot also clears the fact that even with all the services each airport offers, the presence of delays is still one of the major considerations a costumer takes into account when rating an airport.

Correlation plot