Source: NUFORC/Kaggle https://www.kaggle.com/NUFORC/ufo-sightings/data
ufo_data = read.csv("e:/Data Science projects/ufo-sightings-around-the-world/scrubbed.csv")
shape_table = sort(table(ufo_data$shape))
names(shape_table)
## [1] "changed" "dome" "flare" "hexagon" "pyramid"
## [6] "crescent" "round" "delta" "cross" "cone"
## [11] "teardrop" "egg" "chevron" "diamond" "cylinder"
## [16] "rectangle" "flash" "" "changing" "cigar"
## [21] "formation" "oval" "disk" "sphere" "unknown"
## [26] "other" "fireball" "circle" "triangle" "light"
sort(shape_table)
##
## changed dome flare hexagon pyramid crescent round
## 1 1 1 1 1 2 2
## delta cross cone teardrop egg chevron diamond
## 7 233 316 750 759 952 1178
## cylinder rectangle flash changing cigar formation
## 1283 1297 1328 1932 1962 2057 2457
## oval disk sphere unknown other fireball circle
## 3733 5213 5387 5584 5649 6208 7608
## triangle light
## 7865 16565
According to the dataset, there are 27 possible recognisable forms that a UFO can take, such as a dome, a hexagon, a pyramid, an egg, a cigar and so on. The most common observed form of UFO by far, at over 16,000, is a light - this is about twice as common than the second most observed form, a triangle.
Of course, if my knowledge of science fiction TV has taught me anything, just seeing a light could simply mean that the observation was at night time and, with the lights on, it was too dark to see anything much of the shape. Like when you are driving at night and a car is heading toward you from the opposite direction - the headlights are always the first thing you see, long before you see the actual chassis.
shape.canttell = sum(ufo_data$shape == "unknown" | ufo_data$shape == "other" | ufo_data$shape == "")
shape.canttell
## [1] 13165
However, the shape of 13,165 observations are recorded as “unknown”, “other” or not specified. In other words, in actuality, it is not possible to tell the shape of a significant of number of UFO observations recorded by NUFORC. Therefore, it is quite possible that the numbers are subject to bias. For this reason, the values for “unknown”, “other” and " " have been added together into an umbrella category “can’t tell”
shape_table.df <- data.frame(shape_table[-18][-25][-24])
colnames(shape_table.df) <- c("UFO_shape", "Frequency")
cant.tell.df <- data.frame("UFO_shape" = "cannot tell", "Frequency" = shape.canttell)
shape_table.df <- rbind(shape_table.df, cant.tell.df)
shape_table.df <- shape_table.df[order(shape_table.df$Frequency),]
shape_table.df
## UFO_shape Frequency
## 1 changed 1
## 2 dome 1
## 3 flare 1
## 4 hexagon 1
## 5 pyramid 1
## 6 crescent 2
## 7 round 2
## 8 delta 7
## 9 cross 233
## 10 cone 316
## 11 teardrop 750
## 12 egg 759
## 13 chevron 952
## 14 diamond 1178
## 15 cylinder 1283
## 16 rectangle 1297
## 17 flash 1328
## 18 changing 1962
## 19 cigar 2057
## 20 formation 2457
## 21 oval 3733
## 22 disk 5213
## 23 sphere 5387
## 24 fireball 6208
## 25 circle 7608
## 26 triangle 7865
## 28 cannot tell 13165
## 27 light 16565
barplot(as.matrix(shape_table.df$Frequency), beside = TRUE, names.arg = shape_table.df$UFO_shape, xlab = "UFO Shape", ylab = "Frequency")
ufo_shape.country = table(ufo_data$shape, ufo_data$country)
ufo_shape.country
##
## au ca de gb us
## 271 11 45 2 50 1553
## changed 0 0 0 0 0 1
## changing 252 9 69 2 46 1584
## chevron 89 3 36 1 8 815
## cigar 262 15 74 3 60 1643
## circle 891 62 284 10 243 6118
## cone 40 6 10 0 13 247
## crescent 1 0 0 0 0 1
## cross 25 1 9 0 10 188
## cylinder 161 9 53 3 30 1027
## delta 0 0 1 0 0 6
## diamond 154 10 40 3 43 928
## disk 736 50 198 6 102 4121
## dome 1 0 0 0 0 0
## egg 105 12 28 1 32 581
## fireball 682 34 218 9 117 5148
## flare 0 0 0 0 0 1
## flash 174 4 62 1 25 1062
## formation 286 20 98 3 60 1990
## hexagon 0 0 0 0 0 1
## light 1937 119 655 20 361 13473
## other 760 40 241 9 133 4466
## oval 448 30 130 7 86 3032
## pyramid 0 0 0 0 0 1
## rectangle 140 10 47 1 29 1070
## round 0 0 0 0 0 2
## sphere 655 15 205 7 158 4347
## teardrop 88 10 22 0 38 592
## triangle 827 43 268 9 169 6549
## unknown 685 25 207 8 92 4567
Generally speaking, “light”, “triangle”, “circle”, “fireball” and “cannot tell” can be said to be most common form of UFOs observed. However, in Britain, the “sphere” shape has been observed particularly in Britain and the “disk” shape observed particularly in Australia (relative to other areas).
ufoshape.us = table(ufo_data$shape, ufo_data$country == "us")
barplot(tail(sort(ufoshape.us[,2])))
ufoshape.gb = table(ufo_data$shape, ufo_data$country == "gb")
barplot(tail(sort(ufoshape.gb[,2])))
ufoshape.de = table(ufo_data$shape, ufo_data$country == "de")
barplot(tail(sort(ufoshape.de[,2])))
ufoshape.ca = table(ufo_data$shape, ufo_data$country == "ca")
barplot(tail(sort(ufoshape.ca[,2])))
ufoshape.au = table(ufo_data$shape, ufo_data$country == "au")
barplot(tail(sort(ufoshape.au[,2])))