Project

[머신러닝] UFC 데이터로 대회 예측하기

2023. 12. 16. 00:32

논문 리뷰반 피드백 정리 (0)	2023.07.23
YOLOv4-tiny 모델과 경진대회 (0)	2023.07.23
Mahotas 모듈을 이용한 유사한 이미지 찾기 모델 (0)	2023.01.25

[머신러닝] UFC 데이터로 대회 예측하기

상단으로

R_, B_	해당 수치가 Red 선수인지 Blue 선수인지
R_fighter : Fighter names	선수의 이름
R_odds : The American odds that the fighter will win. Usually scraped from bestfightodds.com	배당률
R_ev : The American odds that the fighter will win. Usually scraped from bestfightodds.com	배당금
date : The date of the fight	날짜
country : The country the fight occurs in location : The location of the fight	장소
Winner : The winner of the fight	승자 [Red, Blue, or Draw]
title_bout : Was this a title bout?	tilte_bout인지 아닌지
R_weight_class : The weight class of the bout	체급
R_gender : Gender of the combatants	성별
no_of_rounds: The number of rounds in the fight	해당 경기에서 몇 라운드를 진행하였는지
R_current_lose_streak: Current losing streak R_current_win_streak: Current winning streak	현재 연속적으로 이기거나 진 횟수
R_draw: Number of draws	비긴 횟수 (뜻이 확실치 않지만 승패와 관련된 피처 같아서 제외했습니다.)
R_avg_SIG_STR_landed R_avg_SIG_STR_pct R_avg_SUB_ATT R_avg_TD_landed R_avg_TD_pct	일정 단위(분)당 한 공격 횟수
R_longest_win_streak: Longest winning streak	가장 긴 연승 횟수
R_losses: Total number of losses	총 몇번 졌는지
R_total_rounds_fought: Total rounds fought	경기를 한 횟수
R_total_title_bouts: Total number of title bouts	title bouts 경기를 한 횟수
R_win_by_Decision_Majority R_win_by_Decision_Split R_win_by_Decision_Unanimous R_win_by_KO/TKO R_win_by_Submission R_win_by_TKO_Doctor_Stoppage	승리 방식(1 or 0)
R_wins: Total career wins	총 승리 횟수
R_stance: Fighter stance	선수의 자세
R_Height_cms R_Reach_cms R_Weight_lbs R_age	선수의 키, 리치, 몸무게, 나이
empty_arena	관객이 있는지 없는지
constant_1	The number 1
R_match_weightclass_rank	체급 내에서의 몸무게 순위
R_Women's Flyweight_rank R_Women's Featherweight_rank R_Women's Strawweightrank R_Women's Bantamweight_rank R_Heavyweight_rank R_Light Heavyweight_rank R_Middleweight_rank R_Welterweight_rank R_Lightweight_rank R_Featherweight_rank R_Bantamweight_rank R_Flyweight_rank R_Pound-for-Pound_rank	해당 선수의 체급(1 or 0)
better_rank: Who has the better rank (Red, Blue, neither)	누가 더 rank가 높은지(Blue, Red, neither)
finish: How the fight finished	어떤 방법으로 끝냈는지(SUB, KO/TKO, S-DEC)
finish_details: More details about the finish if available.	어떤 공격으로 끝냈는지(Punch, Nake 등)
finish_round: The round the fight ended	경기를 끝낸 라운드
finish_round_time: Time in the round of the finish	마지막 라운드에 걸린 시간
total_fight_time_secs: Total time of the fight in seconds	총 경기 시간

fighter_name	선수의 이름
Height	선수의 신장
Weight	선수의 몸무게
Reach	선수의 리치
Stance	선수의 공격 자세
DOB	생년월일
SLpM: Significant Strikes Landed per Minute	분당 스트라이크 횟수
Str_Acc	스트라이크 정확도
SApM	모든 상대 선수가 해당 선수에게 기록한 스트라이크 값(상당한 타격만 포함)
Str_Def: Significant Strike Defence (the % of opponents strikes that did not land)	스트라이크 방어율
TD_Avg: Average Takedowns Landed per 15 minutes	15분당 테이크다운 횟수
TD_Acc: Takedown Accuracy	테이크다운 정확도
TD_Def	테이크다운 방어율
Sub_Avg	15분당 서브미션 횟수

MALE	4246
FEMALE	670

선수 데이터가 있는 경기 데이터	4047
선수 데이터가 없는 경기 데이터	198

validation set	linear regression	Polynomial regression	Ridge regression	Random forest
R2 score	0.17	-0.68	0.19	0.29
MSE	0.78	1.4	0.76	0.69
MAE	0.72	0.9	0.61	0.66
MAPE	1.2	1.6	1.2	3.19

[머신러닝] UFC 데이터로 대회 예측하기

< 문제 발견 및 정의 >

< 데이터 셋 >

1. ufc-master.csv

2. raw_fighter_detail.csv

< 데이터 전처리 >

1. 날짜 변경

2. 필요없는 열 삭제

1) current streak

2) gender

3. class mapping

1) 선수들의 이름

2) 그 외의 값

3) NULL, NAN 제거

4. 선수 데이터 추가

5. 파일 분할

6. 키, 몸무게, 리치 매핑

< 방법론 >

1. 회귀: 한 선수의 배당률

1) underfitting or overfitting?

2) 데이터 수가 많은 클래스만 사용

3) 중간 결론

2. 분류: 승패 예측

3-1. 회귀: 승패를 활용한 배당률 예측

3-2. 회귀: 승패 확률을 활용한 회귀

< 결론과 향후 계획 >

'Project' 카테고리의 다른 글

티스토리툴바

	train	test
n_estimatores = 70,	0.897	0.25
n_estimatores = 60,	0.894	0.26
n_estimatores = 50,	0.894	0.23
n_estimatores = 40,	0.891	0.23
n_estimatores = 30,	0.88	0.22

accuracy: 0.65	precision	recall	f1-score	support
0(red가 winner)	0.61	0.43	0.51	325
1(blue가 winner	0.68	0.82	0.74	485
accuracy			0.66	810
macro avg	0.65	0.62	0.62	810
weighted avg	0.65	0.66	0.65	810

r2	0.24
mse	0.72
mae	0.67
mape	2.71

r2	0.25
mse	0.69
mae	0.65
mape	3.29

r2	0.27
mse	0.68
mae	0.65
mape	3.02