I need to know at a specific time the value of all buttons of an xbox controller. The reason being that I'm building a training set for a neural network, and I'm trying to simultaneously take a snapshot of the screen and take a "snapshot" of the controller state. Note that I was able to successfully do this for a keyboard version of this project, but the xbox controller is giving me difficulty.
What I've tried is creating a dictionary of buttons and values, and updating the dictionary every time I receive an event from the controller. Then I would save the image and dictionary as an instance of training data. However, the inputs end up not at all synced with the images. I'm thinking that the issue might be related to threading or subprocesses in one of the packages used to read the controller, but I'm not skilled enough to know how to fix it.
Below is my code.
from inputs import get_gamepad
import time
import cv2
import numpy as np
from mss.windows import MSS as mss
#Only track relevant inputs
gp_state = {#'ABS_HAT0X' : 0, #-1 to 1
#'ABS_HAT0Y' : 0, #-1 to 1
#'ABS_RX' : 0, #-32768 to 32767
#'ABS_RY' : 0, #-32768 to 32767
'ABS_RZ' : 0, #0 to 255
'ABS_X' : 0, #-32768 to 32767
'ABS_Y' : 0, #-32768 to 32767
#'ABS_Z' : 0, #0 to 255
'BTN_EAST' : 0,
'BTN_NORTH' : 0,
#'BTN_SELECT' : 0,
'BTN_SOUTH' : 0,
#'BTN_START' : 0,
#'BTN_THUMBL' : 0,
#'BTN_THUMBR' : 0,
'BTN_TL' : 0,
'BTN_TR' : 0,
'BTN_WEST' : 0,
#'SYN_REPORT' : 0,
}
dead_zone = 7500
def screen_record():
last_time = time.time()
while(True):
# 800x600 windowed mode
printscreen = np.array(ImageGrab.grab(bbox=(0,40,800,640)))
last_time = time.time()
cv2.imshow('window',cv2.cvtColor(printscreen, cv2.COLOR_BGR2RGB))
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
def process_img(image):
original_image = image
processed_img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
contrast = 1
brightness = 0
out = cv2.addWeighted(processed_img, contrast, processed_img, 0, brightness)
return out
def main():
#Give myself time to switch windows
#Screen should be in top left
for _ in range(4):
time.sleep(1)
controller_input = np.zeros(5)
training_data = []
training_files = 0
with mss() as sct:
while True:
#Get screen and display
bbox = (150,240,650,490)
screen = np.array(sct.grab(bbox))
new_screen = process_img(screen)
cv2.imshow('window', new_screen)
new_screen = cv2.resize(new_screen, (100,50))
#Map events to dictionary
events = get_gamepad()
for event in events:
gp_state[event.code] = event.state
#Set to zero if in dead zone
if abs(gp_state['ABS_X']) < dead_zone:
gp_state['ABS_X'] = 0
if abs(gp_state['ABS_Y']) < dead_zone:
gp_state['ABS_Y'] = 0
#Set values to be between 0 and 1.
controller_input[0] = (gp_state['ABS_X'] + 32768) / (32767 + 32768)
controller_input[1] = gp_state['ABS_RZ'] / 255
controller_input[2] = gp_state['BTN_SOUTH']
controller_input[3] = gp_state['BTN_EAST']
controller_input[4] = gp_state['BTN_TR']
record = gp_state['BTN_NORTH'] #Record while holding y button
if record:
training_data.append(np.array([new_screen, controller_input]))
print(controller_input)
time.sleep(1)
if len(training_data) % 500 == 0 and record:
filename = f"training_data/rlb_XBOXtrain_{time.time()}.npy"
np.save(filename, training_data)
training_files += 1
print(f"Trained {training_files} files!")
training_data = []
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
main()
I feel like I am making this way harder than it needs to be. But is there an easier way to just get the state of the controller at a certain point in time?
Note that I've found some solutions that work for Linux, but I am running in Windows 10. Here is an example of a Linux solution: https://github.com/FRC4564/Xbox
the TensorKart Project has already solved that problem: https://github.com/kevinhughes27/TensorKart/blob/master/utils.py
No, this is actually hard. It's hard because you don't need to just know what the gamepad state is at a particular time, you also want to know which gamepad state was used to draw a particular frame. The time that the gamepad state was sampled will always be earlier than the time the frame was drawn, and may be delayed due to latency added by the app itself. The added latency might be constant for the whole app or it might vary between different parts of the app. It's not something you can easily account for.
Your python script is recording gamepad inputs as soon as they are received, so I'd expect it to always run at least a frame or two ahead of the screen captures.
It's probably just latency added by the gamepad input code in the app you're measuring and not something that can be fixed. Most apps don't make any attempt to respond to gamepad inputs as soon as they're received and instead handle them all at once during the per-frame update step. On average, that adds latency equal to half the frame rate.
How to fix this? I think measuring gamepad state from another application is going to be difficult due to the latency issues. If you can, it would be best to instrument the app to record the gamepad state during its main loop, that way you know you are recording what was actually used. On Windows it should be possible to do this by providing your own version of the XInput DLL that can record the current state whenever XInputGetState is called.