Week 5 SPSS Modeler: Neural Network Modeling You will need to build a model that will solve the problem that you have identified. Use the following software applications: (Excel: SPSS Modeler: Neural Network Modeling)
***IMPORTANT: I also need to Provide the raw software files (.STR files) that you used for this assignment to run (Excel: SPSS Modeler: Neural Network Modeling). Please see the example report attached to this assignment. It shows what is needed in the report even though the Neural Network model was not used in the example report.
***You will need to run the (Excel: SPSS Modeler: Neural Network Modeling) for ALL 3 DATA SETS below…..
1. One is with the Full data set. (Data is Below)
2.2nd is with the Data set Under 5 years old (Data is Below)
3. 3rd is with the Data set Over 5 years Old.
(Data is Below)
Write a 200 word summary describing your model. It will include the following:
Why did you choose the Neural Network Model?
Provide specific screenshots from the modeling software in your paper.
Please also include:
Also attach the Streams (.str) files with the assignment so they can see what was done. Should be 3 separate streams for each data set.
Run Analysis Node: Run Model and check top 3 boxes (Coincidence Matrix, Performance Evaluation, Evaluation Metrix).—–> Apply and Run—–> What was the results?
***Please answer the question above and also explain the differences between each data set. You also need to validate which data set is the best. Hopefully you know SPSS: Neural Network Modeling pretty well. I have also attached some of my past work that I have submitted in this class since it all goes together and explains what we are looking to accomplish. Let me know if you have any more questions. Model Building
1
Week 5 Assignment: Model Building
Red Group: Cristian Curiel, Jesse Ely, Seth Garcia, Bobby Washington
GCU: MIS-690 Applied Capstone Project
Wednesday, July 4th, 2018
Model Building
2
Statistical and Predictive Modeling of Traffic Stop Data
As a continuation of the analysis into racial profiling in the Maricopa County Sherriffs
Department, the data science team developed several statistical models. The purpose of modeling
is to: a) prove there is a statistically significant difference in how deputy’s treat drivers of
different races and, b) predict racial bias in the decision to cite or arrest drivers by ethnicity. The
4 models built are listed below in order of most significance, completeness, and accuracy.
Models Built
Logistic Regression (Predictive Modeling for Traffic Citations)
One-way ANOVA (severity of outcome Hispanics only)
Chi-Square
Logistic regression (predictive arrest)
Predictive Modeling for Traffic Citations
To identify the presence of racial bias in issuing citations, a stratified sample was taken
from the Maricopa traffic stop data. The data science team randomly selected 2,500 records from
each of the following racial groups: Asian, Black, Hispanic, Middle-Eastern, Native American,
Other, White. The team applied a logistic regression model to determine if any particular race
was more likely to be ticketed. This can be determined by the statistical significance score of the
independent variables, as well as the coefficient values of the resulting formula. A significant
amount of data transformation was required to adapt the data for a Generalized Linear Model in
R. Excel was used to adapt the data as follows:
True/false fields (‘search_conducted’, ‘contraband_found’) changed to binary (0,1)
‘Violation’ field (unstructured) converted to dummy variables in binary format
Model Building
3
‘Ticket’ variable created as a binary indication of “Citation” = 1, else 0
The sample data set was imported to R as a data frame with all columns converted to factors. The
model summary identifies which independent variables are statistically significant in determining
whether a citation is issued.
The ‘Lights’ variable was removed because its p-value was greater than 0.05. The remaining
variables were retained to support the prediction by race.
Model Summary
All other values being equal, Hispanics (specifically Hispanic males) are more likely to
be ticketed as shown by the coefficient value (~0.58) compared to other races. The final equation
is represents as a function of y. Interesting, blacks are the least convicted.
y = -2.68412 + 1.04441*driver_genderM – 0.24441*driver_raceBlack + 0.52772*driver_raceHispanic +
0.02808*driver_raceOther – 0.15768*driver_raceWhite – 2.69973*search_conducted_flag 2.13081*contraband_found_flag – 5.51637*DUI + 4.97748*Equipment + 3.98497*License +
5.96968*Moving_violation + 4.40971*Other + 4.46516*Paperwork + 5.23298*Registration_plates +
5.79620*Safe_movement + 6.54089*Speeding + 9.49208*Stopsign_light
Model Building
4
When applied to the test data, correctness of classification is 93.4%. A confusion matrix (created in
Excel) identifies the performance of the model against test data.
ACTUAL
Y
N
PREDICTED Y 2863
53
N 290 2013
Sensitivity
PPV
Specificity
0.908024
0.981824
0.974347
Traffic_Stop_All_Samp
le_Filtered_v3.xls
One-Way ANOVA
A one-way ANOVA test for years 2010 and 2011 was processed in Excel, to test the null
hypothesis that the means of the stop outcomes between 2010-2011 are equal. The severity of
outcome was analyzed for Hispanics only. Severities in the Excel workbook are classified in a
new column [stop_outcome_#] by:
–
Warning: 1
–
Citation: 2
–
Arrest: 3
–
All Others: 0
The dataset utilized for this test was retrieved from the Stanford Open Policing Project, an
interdisciplinary team of researchers from Stanford University collecting and standardizing data
on vehicle and pedestrian stops from law enforcement departments across the country. The
Arizona stop data contains records on 2,251,992 stops from 2009-2015. The columns retained for
the analysis were: [id], [state], [stop_date], [stop_time], [county_name], [county_fips],
[fine_grained_location], [driver_gender], [driver_race_raw], [driver_race], [search_conducted],
Model Building
[contraband_found], [stop_outcome], [is_arrested], [officer_id], [road_number], [milepost],
[vehicle_type], and [ethnicity], as all their rows were populated. All others were removed due a
high number of blank cells. The applied filters derived 1,203 random stop outcome records for
Hispanics in both 2010 and 2011 with classified stop outcomes 1-3. Assumptions for a one-way
analysis of variance are:
Populations are normally distributed
Populations have equal variances
Samples are randomly and independently drawn
The hypothesis of our one-way ANOVA is H0: All populations means are equal. This would
indicate there is no change in the stop outcome of Hispanics between 2010 and 2011. However,
we are predicting the null hypothesis is false meaning: H1: Not all of the population means are
equal. A false null hypothesis would indicate there is a change (variance) in the stop outcome of
Hispanics between 2010 and 2011. The following should hold true in the case of a false null
hypothesis:
At least one population mean is different (2010 or 2011).
i.e., there is a factor effect (an increase of more severe outcomes occurred between 2010
and 2011).
Doesnt necessarily mean all population means are different
5
Model Building
6
Anova: Single Factor
SUMMARY
Groups
2010
2011
Count
1203
1203
ANOVA
Source of Variation
Between Groups
Within Groups
SS
4.239817124
1413.403159
Total
1417.642976
Sum
Average
Variance
1831 1.522028263 0.63907342
1932 1.605985037 0.536802752
df
MS
F
P-value
F crit
1 4.239817124 7.211332664 0.007294254 3.845329902
2404 0.587938086
2405
In Excel, if F > F crit, we reject the null hypothesis. In this case 7.21 > 3.85. Therefore,
we reject the null hypothesis. The means of 2010 and 2011 Hispanic stop outcome data are not
equal; both means are different meaning there is a change between the two years. It is important
to note that ANOVA does not state exactly where the difference lies, especially when there are
more than two groups. A t-Test is required to test each pair of means. With just 2 populations
the F significance and P value should be and are the same the same Both the sum and average for
2011 were higher, alluding to more racial based against Hispanics as time progressed from 20102011.
Model Building
7
t-Test: Two-Sample Assuming Unequal Variances
Mean
Variance
Observations
Hypothesized Mean Difference
df
t Stat
P(T
Purchase answer to see full
attachment
Why Choose Us
Top quality papers
We always make sure that writers follow all your instructions precisely. You can choose your academic level: high school, college/university or professional, and we will assign a writer who has a respective degree.
Professional academic writers
We have hired a team of professional writers experienced in academic and business writing. Most of them are native speakers and PhD holders able to take care of any assignment you need help with.
Free revisions
If you feel that we missed something, send the order for a free revision. You will have 10 days to send the order for revision after you receive the final paper. You can either do it on your own after signing in to your personal account or by contacting our support.
On-time delivery
All papers are always delivered on time. In case we need more time to master your paper, we may contact you regarding the deadline extension. In case you cannot provide us with more time, a 100% refund is guaranteed.
Original & confidential
We use several checkers to make sure that all papers you receive are plagiarism-free. Our editors carefully go through all in-text citations. We also promise full confidentiality in all our services.
24/7 Customer Support
Our support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.
Try it now!
How it works?
Follow these simple steps to get your paper done
Place your order
Fill in the order form and provide all details of your assignment.
Proceed with the payment
Choose the payment system that suits you most.
Receive the final file
Once your paper is ready, we will email it to you.
Our Services
No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.
Essays
You are welcome to choose your academic level and the type of your paper. Our academic experts will gladly help you with essays, case studies, research papers and other assignments.
Admissions
Admission help & business writing
You can be positive that we will be here 24/7 to help you get accepted to the Master’s program at the TOP-universities or help you get a well-paid position.
Reviews
Editing your paper
Our academic writers and editors will help you submit a well-structured and organized paper just on time. We will ensure that your final paper is of the highest quality and absolutely free of mistakes.
Reviews
Revising your paper
Our academic writers and editors will help you with unlimited number of revisions in case you need any customization of your academic papers