Mephisto: ParlAI Chat Demo - Something wrong with worker pairing and agent status updates
Hi,
I have been doing some pilot studies for a crowdsourcing task that pairs 2 MTurk workers for a conversation. I noticed some unusual behavior in some assignments. Specifically, there are cases where the worker starts the HIT and gets the partner timeout message within a few seconds. I’ll list down the steps to replicate the task setup and provide the logs below.
I forked Mephisto (vaibhavad/Mephisto) (just a few hours ago) and made some minor changes:
- Changed the logging (vaibhavad/Mephisto@f5158ed0ce0fd91a215cd13b6482ef21e6c510d9) in
supervisor.py
,operator.py
, andblueprint.py
so that issue-specific information is logged. - Statically linked packages (vaibhavad/Mephisto@a18ac6e9967fd016be3dafb740744fcee3e37fb6) as somehow node modules we not working on my system (#325)
- Changed task configuration to 10 conversations (vaibhavad/Mephisto@c00c4b488146f247697580c2f0086373623ff02a), and using
custom_prebuilt.yaml
so thatbundle.js
is picked fromwebapp
.
Running the task - I used the following steps
git clone https://github.com/vaibhavad/Mephisto.git
pip install parlai
cd Mephisto
pip install -e .
mephisto register mturk_sandbox name=my_mturk_user_sandbox access_key_id=<ACCESS_KEY> secret_access_key=<SECRET_KEY>
cd Mephisto/packages/mephisto-task
npm install; npm run dev
cd Mephisto/packages/bootstrap-chat
npm install; npm run dev
cd Mephisto/examples/parlai_chat_task_demo/webapp
npm install; npm run dev
cd ..
python parlai_test_script.py mephisto/architect=heroku mephisto.provider.requester_name=my_mturk_user_sandbox
I tested the system using three different MTurk Sandbox accounts, so 3 workers are registered. I kept returning the task midway frequently and starting new ones from different accounts to replicate the scenario I observed in production. Here are some observation (with reference to logs) mephisto_logs.txt
- (Line 79) Task available on Worker Sandbox
- (Line 159) Worker 2 returned the HIT as Agent 2, and started to work on a new task as Agent 4, but the status is updated after ~1-2 minutes (Line 194). In some cases we saw that this time to update the agent status is even longer.
- (Line 327) As we are testing with only 3 workers, having 6
in_task
agent statuses means some of these states are stale. - The specific abnormal behavior we observed is visible from Line 340-347. Assignment 3 was originally launched with Agents 5 and 6 (Line 206). Their statuses changed in Line 331. Agent 11 is created and paired without any waiting with Agent 5 (Line 340), although Agent 5 status was
returned
, notwaiting
. Agent 6 status updated frompartner disconnect
tocompleted
, which is also unusual. Finally, Agent 11 getspartner disconnect
within 5 seconds of starting the Assignment (Line 347). - Similar behavior is also observed in Line 410-414. Agent 14 is paired with Agent 5 although Agent 5’s status is
timeout
.
Is there a way to make sure that only agents with status waiting
are paired?
How does the Heroku server declare an agent disconnected/returned? I am assuming it must be based on the frequency of alive signals received. Can you point me to that specific code section?
Also, the unusual pairing mentioned above is always preceded with the line Updating a final status, was timeout/returned and want to set to in task
, which I’m assuming is referring to the status of Agent 5 (Line 339 and 409). It from update_status
function in data_model/agent.py, although I don’t quite understand the sequence of functions which lead to it being called.
I’ll be very grateful if you can help me with this. 😃
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16 (16 by maintainers)
#347 is now in, so I’ll be closing this issue. Huge thanks for helping us debug what was going on here! Let us know if anything else comes up 👍
Stale states definitely looks like a
heroku
issue. I triedMockProvider
with bothLocalArchitect
andHerokuArchitect
as you suggested. Socket disconnect or socket error is not being detected byHerokuArchitect
. I tried with both Safari and Google Chrome. It is very likely that all the other issues are originating from this.In one of the runs, the rare case happened. Here are the relevant Mephisto logs (
HerokuArchitect
andMockProvider
).Agent 22 was actually disconnected but HerokuArchitect could not detect the socket disconnect so the status is
waiting
. But somehow the state is not updated toin task
even though the assignment has been launched with this agent. In 4-5 test runs, this only happened once.