June 17, 2022
Today's widget is a Jupyter Notebook converted html file that introduces us to supervised learning. The support vector machine model is trained to predict if the input data is grouped in the "positive hyperplane" or "negative hyperplane" of the training set graph.
data: BMI, blood glucose, insulin lvl
STEPS
#https://www.youtube.com/watch?v=xUE7SjVx9bQ&list=PLfFghEzKVmjsNtIRwErklMAN8nJmebB0I&index=30
#for command parameters of pd.read_ccsv enter: "pd.read_csv?"
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
data = pd.read_csv("dataset/diabetes.csv")
data.head()
data.shape
data.describe()
data["Outcome"].value_counts()
0 represents non-diabetic 1 represents diabetic
data.groupby("Outcome").mean()
# separating data and labels
x = data.drop(columns="Outcome",axis=1)
y = data["Outcome"]
print(x)
print(y)
Data Standardization
scaler = StandardScaler()
#scaler.fit(x)
#standardized_data = scaler.transform(x)
#can combine the fit and transform command
standardized_data = scaler.fit_transform(x)
print(standardized_data)
x = standardized_data
y = data["Outcome"]
#print (x)
#print (y)
Train Test Split
#reserve 20% of dataset for test; stratify to ensure even nondiabetic and diabetic split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, stratify=y, random_state=1)
print(x.shape, x_train.shape, x_test.shape)
Train model
classifer = svm.SVC(kernel="linear")
classifer.fit(x_train, y_train)
Evaluate Model: accuracy score overfitting: high accuracy score on train, low accuracy score on test
x_train_prediction = classifer.predict(x_train)
training_data_accuracy = accuracy_score(x_train_prediction, y_train)
print("Accuracy on Training Data : ", training_data_accuracy)
x_test_prediction = classifer.predict(x_test)
test_data_accuracy = accuracy_score(x_test_prediction, y_test)
print("Accuracy on Test Data : ", test_data_accuracy)
Making a Predictive System input_data to numpy array bc process faster; converts list to numpy array model trained on 768 examples, need to reshape array since using only one datapoint
#known nondiabetic data
#input_data = (1,121,78,39,74,39,0.261,28)
#known diabetic data
#input_data = (2,174,88,37,120,44.5,0.646,24)
input_data = (5,120,78,23,79,28.4,0.323,34)
input_data_as_numpy_array = np.asarray(input_data)
input_data_reshaped = input_data_as_numpy_array.reshape(1, -1)
#standardize data, since training data was standardized
std_data = scaler.transform(input_data_reshaped)
print(std_data)
prediction = classifer.predict(std_data)
print(prediction)
if(prediction[0] == 0):
print("Model predicts a nondiabetic")
else:
print ("Model predicts a diabetic")