Source: NUFORC/Kaggle https://www.kaggle.com/NUFORC/ufo-sightings/data

ufo_data = read.csv("e:/Data Science projects/ufo-sightings-around-the-world/scrubbed.csv")

The shape of UFOs?

shape_table = sort(table(ufo_data$shape))
names(shape_table)
##  [1] "changed"   "dome"      "flare"     "hexagon"   "pyramid"  
##  [6] "crescent"  "round"     "delta"     "cross"     "cone"     
## [11] "teardrop"  "egg"       "chevron"   "diamond"   "cylinder" 
## [16] "rectangle" "flash"     ""          "changing"  "cigar"    
## [21] "formation" "oval"      "disk"      "sphere"    "unknown"  
## [26] "other"     "fireball"  "circle"    "triangle"  "light"
sort(shape_table)
## 
##   changed      dome     flare   hexagon   pyramid  crescent     round 
##         1         1         1         1         1         2         2 
##     delta     cross      cone  teardrop       egg   chevron   diamond 
##         7       233       316       750       759       952      1178 
##  cylinder rectangle     flash            changing     cigar formation 
##      1283      1297      1328      1932      1962      2057      2457 
##      oval      disk    sphere   unknown     other  fireball    circle 
##      3733      5213      5387      5584      5649      6208      7608 
##  triangle     light 
##      7865     16565

According to the dataset, there are 27 possible recognisable forms that a UFO can take, such as a dome, a hexagon, a pyramid, an egg, a cigar and so on. The most common observed form of UFO by far, at over 16,000, is a light - this is about twice as common than the second most observed form, a triangle.

Of course, if my knowledge of science fiction TV has taught me anything, just seeing a light could simply mean that the observation was at night time and, with the lights on, it was too dark to see anything much of the shape. Like when you are driving at night and a car is heading toward you from the opposite direction - the headlights are always the first thing you see, long before you see the actual chassis.

shape.canttell = sum(ufo_data$shape == "unknown" | ufo_data$shape == "other" | ufo_data$shape == "")
shape.canttell
## [1] 13165

However, the shape of 13,165 observations are recorded as “unknown”, “other” or not specified. In other words, in actuality, it is not possible to tell the shape of a significant of number of UFO observations recorded by NUFORC. Therefore, it is quite possible that the numbers are subject to bias. For this reason, the values for “unknown”, “other” and " " have been added together into an umbrella category “can’t tell”

shape_table.df <- data.frame(shape_table[-18][-25][-24])
colnames(shape_table.df) <- c("UFO_shape", "Frequency")
cant.tell.df <- data.frame("UFO_shape" = "cannot tell", "Frequency" = shape.canttell)
shape_table.df <- rbind(shape_table.df, cant.tell.df)
shape_table.df <- shape_table.df[order(shape_table.df$Frequency),]
shape_table.df
##      UFO_shape Frequency
## 1      changed         1
## 2         dome         1
## 3        flare         1
## 4      hexagon         1
## 5      pyramid         1
## 6     crescent         2
## 7        round         2
## 8        delta         7
## 9        cross       233
## 10        cone       316
## 11    teardrop       750
## 12         egg       759
## 13     chevron       952
## 14     diamond      1178
## 15    cylinder      1283
## 16   rectangle      1297
## 17       flash      1328
## 18    changing      1962
## 19       cigar      2057
## 20   formation      2457
## 21        oval      3733
## 22        disk      5213
## 23      sphere      5387
## 24    fireball      6208
## 25      circle      7608
## 26    triangle      7865
## 28 cannot tell     13165
## 27       light     16565
barplot(as.matrix(shape_table.df$Frequency), beside = TRUE, names.arg = shape_table.df$UFO_shape, xlab = "UFO Shape", ylab = "Frequency")

How do the shape of observed UFOs differ across countries?

ufo_shape.country = table(ufo_data$shape, ufo_data$country)
ufo_shape.country
##            
##                      au    ca    de    gb    us
##               271    11    45     2    50  1553
##   changed       0     0     0     0     0     1
##   changing    252     9    69     2    46  1584
##   chevron      89     3    36     1     8   815
##   cigar       262    15    74     3    60  1643
##   circle      891    62   284    10   243  6118
##   cone         40     6    10     0    13   247
##   crescent      1     0     0     0     0     1
##   cross        25     1     9     0    10   188
##   cylinder    161     9    53     3    30  1027
##   delta         0     0     1     0     0     6
##   diamond     154    10    40     3    43   928
##   disk        736    50   198     6   102  4121
##   dome          1     0     0     0     0     0
##   egg         105    12    28     1    32   581
##   fireball    682    34   218     9   117  5148
##   flare         0     0     0     0     0     1
##   flash       174     4    62     1    25  1062
##   formation   286    20    98     3    60  1990
##   hexagon       0     0     0     0     0     1
##   light      1937   119   655    20   361 13473
##   other       760    40   241     9   133  4466
##   oval        448    30   130     7    86  3032
##   pyramid       0     0     0     0     0     1
##   rectangle   140    10    47     1    29  1070
##   round         0     0     0     0     0     2
##   sphere      655    15   205     7   158  4347
##   teardrop     88    10    22     0    38   592
##   triangle    827    43   268     9   169  6549
##   unknown     685    25   207     8    92  4567

Generally speaking, “light”, “triangle”, “circle”, “fireball” and “cannot tell” can be said to be most common form of UFOs observed. However, in Britain, the “sphere” shape has been observed particularly in Britain and the “disk” shape observed particularly in Australia (relative to other areas).

distribution of most common shapes accross the US

ufoshape.us = table(ufo_data$shape, ufo_data$country == "us")
barplot(tail(sort(ufoshape.us[,2])))

distribution of most common shapes across Great Britain

ufoshape.gb = table(ufo_data$shape, ufo_data$country == "gb")
barplot(tail(sort(ufoshape.gb[,2])))

distribution of most common shapes across Denmark

ufoshape.de = table(ufo_data$shape, ufo_data$country == "de")
barplot(tail(sort(ufoshape.de[,2])))

distribution of most common shapes across Canada

ufoshape.ca = table(ufo_data$shape, ufo_data$country == "ca")
barplot(tail(sort(ufoshape.ca[,2])))

distribution of most common shapes across Australia

ufoshape.au = table(ufo_data$shape, ufo_data$country == "au")
barplot(tail(sort(ufoshape.au[,2])))