Week 5 SPSS Modeler: Neural Network Modeling You will need to build a model that will solve the problem that you have identified. Use the following softwar

Week 5 SPSS Modeler: Neural Network Modeling You will need to build a model that will solve the problem that you have identified. Use the following software applications: (Excel: SPSS Modeler: Neural Network Modeling)

***IMPORTANT: I also need to Provide the raw software files (.STR files) that you used for this assignment to run (Excel: SPSS Modeler: Neural Network Modeling). Please see the example report attached to this assignment. It shows what is needed in the report even though the Neural Network model was not used in the example report.

Don't use plagiarized sources. Get Your Custom Essay on

Just from $13/Page

Order Essay

***You will need to run the (Excel: SPSS Modeler: Neural Network Modeling) for ALL 3 DATA SETS below…..

1. One is with the Full data set. (Data is Below)

2.2nd is with the Data set Under 5 years old (Data is Below)

3. 3rd is with the Data set Over 5 years Old.

(Data is Below)

Write a 200 word summary describing your model. It will include the following:

Why did you choose the Neural Network Model?
Provide specific screenshots from the modeling software in your paper.

Please also include:

Also attach the Streams (.str) files with the assignment so they can see what was done. Should be 3 separate streams for each data set.
Run Analysis Node: Run Model and check top 3 boxes (Coincidence Matrix, Performance Evaluation, Evaluation Metrix).—–> Apply and Run—–> What was the results?

***Please answer the question above and also explain the differences between each data set. You also need to validate which data set is the best. Hopefully you know SPSS: Neural Network Modeling pretty well. I have also attached some of my past work that I have submitted in this class since it all goes together and explains what we are looking to accomplish. Let me know if you have any more questions. Model Building
1
Week 5 Assignment: Model Building
Red Group: Cristian Curiel, Jesse Ely, Seth Garcia, Bobby Washington
GCU: MIS-690 Applied Capstone Project
Wednesday, July 4th, 2018
Model Building
2
Statistical and Predictive Modeling of Traffic Stop Data
As a continuation of the analysis into racial profiling in the Maricopa County Sherriffs
Department, the data science team developed several statistical models. The purpose of modeling
is to: a) prove there is a statistically significant difference in how deputy’s treat drivers of
different races and, b) predict racial bias in the decision to cite or arrest drivers by ethnicity. The
4 models built are listed below in order of most significance, completeness, and accuracy.
Models Built

Logistic Regression (Predictive Modeling for Traffic Citations)

One-way ANOVA (severity of outcome Hispanics only)

Chi-Square

Logistic regression (predictive arrest)
Predictive Modeling for Traffic Citations
To identify the presence of racial bias in issuing citations, a stratified sample was taken
from the Maricopa traffic stop data. The data science team randomly selected 2,500 records from
each of the following racial groups: Asian, Black, Hispanic, Middle-Eastern, Native American,
Other, White. The team applied a logistic regression model to determine if any particular race
was more likely to be ticketed. This can be determined by the statistical significance score of the
independent variables, as well as the coefficient values of the resulting formula. A significant
amount of data transformation was required to adapt the data for a Generalized Linear Model in
R. Excel was used to adapt the data as follows:

True/false fields (‘search_conducted’, ‘contraband_found’) changed to binary (0,1)

‘Violation’ field (unstructured) converted to dummy variables in binary format
Model Building

3
‘Ticket’ variable created as a binary indication of “Citation” = 1, else 0
The sample data set was imported to R as a data frame with all columns converted to factors. The
model summary identifies which independent variables are statistically significant in determining
whether a citation is issued.
The ‘Lights’ variable was removed because its p-value was greater than 0.05. The remaining
variables were retained to support the prediction by race.
Model Summary
All other values being equal, Hispanics (specifically Hispanic males) are more likely to
be ticketed as shown by the coefficient value (~0.58) compared to other races. The final equation
is represents as a function of y. Interesting, blacks are the least convicted.
y = -2.68412 + 1.04441*driver_genderM – 0.24441*driver_raceBlack + 0.52772*driver_raceHispanic +
0.02808*driver_raceOther – 0.15768*driver_raceWhite – 2.69973*search_conducted_flag 2.13081*contraband_found_flag – 5.51637*DUI + 4.97748*Equipment + 3.98497*License +
5.96968*Moving_violation + 4.40971*Other + 4.46516*Paperwork + 5.23298*Registration_plates +
5.79620*Safe_movement + 6.54089*Speeding + 9.49208*Stopsign_light
Model Building
4
When applied to the test data, correctness of classification is 93.4%. A confusion matrix (created in
Excel) identifies the performance of the model against test data.
ACTUAL
Y
N
PREDICTED Y 2863
53
N 290 2013
Sensitivity
PPV
Specificity
0.908024
0.981824
0.974347
Traffic_Stop_All_Samp
le_Filtered_v3.xls
One-Way ANOVA
A one-way ANOVA test for years 2010 and 2011 was processed in Excel, to test the null
hypothesis that the means of the stop outcomes between 2010-2011 are equal. The severity of
outcome was analyzed for Hispanics only. Severities in the Excel workbook are classified in a
new column [stop_outcome_#] by:
–
Warning: 1
–
Citation: 2
–
Arrest: 3
–
All Others: 0
The dataset utilized for this test was retrieved from the Stanford Open Policing Project, an
interdisciplinary team of researchers from Stanford University collecting and standardizing data
on vehicle and pedestrian stops from law enforcement departments across the country. The
Arizona stop data contains records on 2,251,992 stops from 2009-2015. The columns retained for
the analysis were: [id], [state], [stop_date], [stop_time], [county_name], [county_fips],
[fine_grained_location], [driver_gender], [driver_race_raw], [driver_race], [search_conducted],
Model Building
[contraband_found], [stop_outcome], [is_arrested], [officer_id], [road_number], [milepost],
[vehicle_type], and [ethnicity], as all their rows were populated. All others were removed due a
high number of blank cells. The applied filters derived 1,203 random stop outcome records for
Hispanics in both 2010 and 2011 with classified stop outcomes 1-3. Assumptions for a one-way
analysis of variance are:

Populations are normally distributed

Populations have equal variances

Samples are randomly and independently drawn
The hypothesis of our one-way ANOVA is H0: All populations means are equal. This would
indicate there is no change in the stop outcome of Hispanics between 2010 and 2011. However,
we are predicting the null hypothesis is false meaning: H1: Not all of the population means are
equal. A false null hypothesis would indicate there is a change (variance) in the stop outcome of
Hispanics between 2010 and 2011. The following should hold true in the case of a false null
hypothesis:

At least one population mean is different (2010 or 2011).

i.e., there is a factor effect (an increase of more severe outcomes occurred between 2010
and 2011).

Doesnt necessarily mean all population means are different
5
Model Building
6
Anova: Single Factor
SUMMARY
Groups
2010
2011
Count
1203
1203
ANOVA
Source of Variation
Between Groups
Within Groups
SS
4.239817124
1413.403159
Total
1417.642976
Sum
Average
Variance
1831 1.522028263 0.63907342
1932 1.605985037 0.536802752
df
MS
F
P-value
F crit
1 4.239817124 7.211332664 0.007294254 3.845329902
2404 0.587938086
2405
In Excel, if F > F crit, we reject the null hypothesis. In this case 7.21 > 3.85. Therefore,
we reject the null hypothesis. The means of 2010 and 2011 Hispanic stop outcome data are not
equal; both means are different meaning there is a change between the two years. It is important
to note that ANOVA does not state exactly where the difference lies, especially when there are
more than two groups. A t-Test is required to test each pair of means. With just 2 populations
the F significance and P value should be and are the same the same Both the sum and average for
2011 were higher, alluding to more racial based against Hispanics as time progressed from 20102011.
Model Building
7
t-Test: Two-Sample Assuming Unequal Variances
Mean
Variance
Observations
Hypothesized Mean Difference
df
t Stat
P(T
Purchase answer to see full
attachment

Week 5 SPSS Modeler: Neural Network Modeling You will need to build a model that will solve the problem that you have identified. Use the following softwar

Calculate the price of your order

Essay Writing Service