ITS836 Cumberlands Types of Iris in Hawaii Clustering Methods Project Deliverable should be a powerpoint or Word, Use existing homework template With your name, ID and date. School of Computer &
Information Sciences
ITS 836 Data Science and Big Data Analytics
1
Lecture 04: Clustering – Homework
Homework 1: Review the Iris Data set perform
clustering via “kmeans” (see next slides)
Homework 2: Perform hierarchical clustering on
the iris data set: https://cran.rproject.org/web/packages/dendextend/vignett
es/Cluster_Analysis.html
Homework 3: Use the data set USArrests, to
cluster the US States according to
https://uc-r.github.io/kmeans_clustering
Deliverable should be a powerpoint or
Homework 4: Review Section 4_2R Exercise
Word, Use existing homework template
Homework 5 Clustering on a data set
With your name, ID and date.
Homework 6: Continue R for Datascience
ITS 836
2
exercises
Iris Dataset Source
Goal: Predict the types of iris in Hawaii
R.A. Fisher, 1936
• Attributes: sepal length, sepal width, petal length, petal width
– All flowers contain a sepal and a petal
– For the iris flowers three categories (Versicolor, Setosa, Virginica) different measurements
ITS 836
3
View Iris Data Set
Iris data comes with R install
str(iris)
‘data.frame’:
150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 …
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 …
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 …
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 …
$ Species : Factor w/ 3 levels “setosa”,”versicolor”,..: 1 1 1 1 1 1 1 1 1 1 …
Species attribute: division of the species of flowers is 50-50-50.
table(iris$Species)
setosa versicolor virginica
50
50
50
ITS 836
4
Visualize iris data
library(ggplot2)
ggplot(data = iris, aes(x = Sepal.Length, y =
Sepal.Width, col = Species)) + geom_point()
• Observe:
– High correlation between
the sepal length and the
sepal width of the Setosa
iris
– Lesser correlation for
Virginica and Versicolor
• the data points are more
spread out over the graph
and don’t form a cluster
like you can see in the case
of the Setosa flowers.
ITS 836
5
Visualize iris data
ggplot(data = iris, aes(x = Petal.Length, y =
Petal.Width, col = Species)) + geom_point()
• Positive correlation:
– between the petal
length and the petal
width for all species
ITS 836
6
correlations
library(GGally)
ggpairs(iris)
• As shown correlation
between Petal Width &:
– Petal Length (0.963)
– Sepal Length (0.818)
• And b/w Petal Length
– Sepal Length (0.872)
ITS 836
7
Lecture 4 Homework 1:
Clustering with k-means
head(iris)
#remove last column
iris_2
Purchase answer to see full
attachment
Consider the following information, and answer the question below. China and England are international trade…
The CPA is involved in many aspects of accounting and business. Let's discuss some other…
For your initial post, share your earliest memory of a laser. Compare and contrast your…
2. The Ajax Co. just decided to save $1,500 a month for the next five…
How to make an insertion sort to sort an array of c strings using the…
Assume the following Keynesian income-expenditure two-sector model: AD = Cp + Ip Cp = Co…