Page MenuHomePhabricator

[EPIC] Deploy "add a link" to all Wikipedias
Open, LowPublic

Description

After having "add a link" on about 10 wikis for several months, we learned about valuable improvements to make. Those improvements are collected in this epic: T300851: [EPIC] Growth: "add a link" structured task 2.0. Once those improvements are complete, we will be comfortable deploying "add a link" more broadly. This task is about generating the suggestions for another set of wikis and deploying there.

This process was started in T290011: [OLD] Deploy Add a link to a third round of wikis, which is kept under this task for reference.


CRS involvement for reach batch:

  1. Test the models
  2. Based on results, advise and decide if a wiki should get the model
  3. Inform communities (mainly through Tech News).

Related Objects

StatusSubtypeAssignedTask
Open lbowmaker
OpenTrizek-WMF
DeclinedNone
ResolvedTrizek-WMF
ResolvedTgr
ResolvedTrizek-WMF
ResolvedTgr
ResolvedEtonkovidova
Resolvedkevinbazira
Resolvedkevinbazira
Resolvedkostajh
ResolvedSgs
ResolvedSgs
InvalidTrizek-WMF
ResolvedSgs
Openkevinbazira
ResolvedSgs
ResolvedNone
ResolvedUrbanecm_WMF
Resolvedkevinbazira
ResolvedSgs
Invalidkevinbazira
ResolvedSgs
ResolvedSgs
ResolvedSgs
ResolvedSgs
ResolvedSgs
ResolvedTrizek-WMF
ResolvedTrizek-WMF
ResolvedUrbanecm_WMF
ResolvedUrbanecm_WMF
ResolvedMon, Nov 25Urbanecm_WMF
ResolvedUrbanecm_WMF
ResolvedSgs
ResolvedEtonkovidova
ResolvedUrbanecm_WMF
ResolvedSgs
OpenNone
Opencalbon
Resolvedkevinbazira
Resolvedkevinbazira
Resolvedkevinbazira
Resolved AKhatun_WMF
Resolvedkevinbazira
Resolvedkevinbazira
OpenNone
OpenNone
ResolvedUrbanecm_WMF
ResolvedEtonkovidova
Resolvedpfischer
OpenNone
OpenIflorez

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@kevinbazira, all rounds have been created, they are sub-tasks of the current one. Please proceed on model training as you can! :)

Trizek-WMF changed the task status from Open to In Progress.May 11 2022, 5:44 PM

Also @kevinbazira, I tried to make rounds that gather approximatly the same number of wikis, with a lot of small ones plus a big one, or a few mid-sized ones with some small wikis. Let me know if I should change the number of wikis there to help your work building the models (make bigger batches, or smaller ones).

@Trizek-WMF, thank you for creating all the rounds. I am working on generating datasets and models round by round and will be sharing updates on the sub-tasks.

The task generation for the Add Link task pool runs on one thread per DB server group (s1-s7). The two large DB server groups (in terms of number of wikis) are s5 (21 Wikipedias) and s3 (280 Wikipedias). For mid-sized wikis task generation took about 5-6 hours for a single wiki. It will probably take the same time for large wikis as well (we are going for a constant number of wikis, regardless of size); might take less time for small wikis where candidates can be exhausted before the required number of tasks are found. But assuming it doesn't, 280 Wikipedias is ~70 days, so we should probably run 4 instances of the refresh script in parallel on s3.

The script creates a lock with the ID of the wiki it is processing, so running multiple instances in parallel on the same dblist should be fine - the second thread will just skip the wiki that the first is already processing.

Proposal for streamlining the completion of remaining rounds:

  1. Train models, verify models, publish datasets for all remaining wikis. This involves work from Machine-Learning-Team (cc @kevinbazira) and Research (cc @MGerlach). Growth doesn't need to be consulted on this phase; once these teams are happy with the models, the datasets can be published.
  2. Growth engineers will populate the excluded section titles for all remaining wikis
  3. Growth engineers will enable the backend for all remaining wikis, so the task pools begin to fill up
  4. @Trizek-WMF can then verify the hasrecommendation:link results and check the API https://api.wikimedia.org/service/linkrecommendation/apidocs/#/default/get_v1_linkrecommendations__project___domain___page_title_
  5. @Trizek-WMF can inform communities, and Growth engineers can enable the front-end, either staggered or en masse, depending on what works better for @Trizek-WMF

tl;dr I think we would all save time with context switching if we can do the model training, section title population, and backend enabling all at once, rather than in phases over a period of months. Then the actual presentation to communities could be done in a more staggered way if that is better from a community relations standpoint.

@KStoller-WMF @kevinbazira @MGerlach what do you think?

@kostajh +1 on doing the initial phases at once to avoid context switching.

The ML team will proceed with training models, evaluating them, and publishing datasets for all the remaining rounds.

As far as we keep a staggered deployment, I'm fine. :)

+1
This sounds like a great approach!

Trizek-WMF changed the task status from In Progress to Stalled.Jan 17 2023, 5:36 PM
Trizek-WMF changed the status of subtask T304953: Schedule the deployment of "Add a link" to more wikis from In Progress to Stalled.

We moved to a staggered deployment process. When all wikis will have trained models, then we will resume deployments.

When all wikis will have trained models, then we will resume deployments.

@kevinbazira
Do we have a rough ETA for when model training will be done for all Wikipedias? Thanks!

@KStoller-WMF, we are currently working on the 9th out of 18 rounds of wikis. Each of the 9 remaining rounds has ~20 models. ETA to train, evaluate, and publish all these models is about a month or more depending on the size of each of these wikis and whether or not we have to fine-tune the link recommendation algorithm to support a wiki's language-specific characters.

Will be sharing progress updates on the sub-tasks.

Sgs changed the status of subtask T308133: Deploy "add a link" to 8th round of wikis from Open to In Progress.
Sgs changed the status of subtask T308134: Deploy "add a link" to 9th round of wikis from Open to In Progress.
Trizek-WMF changed the task status from Stalled to In Progress.Mar 14 2023, 1:15 PM
Trizek-WMF added a project: Epic.

@Trizek-WMF Please further describe in this Phab ticket, what level of involvement is needed from CRS

Sgs changed the status of subtask T308141: Deploy "add a link" to 15th round of wikis from Open to In Progress.
KStoller-WMF renamed this task from Deploy "add a link" to all Wikipedias to [EPIC] Deploy "add a link" to all Wikipedias.Oct 13 2023, 4:59 AM

On May 21, the last set of active wikis will have Add a link deployed. The next steps will be:

As a reminder, this deployment project started with the successful deployment of round 2, on Aug 14 2021 (round 1 being Growth pilot wikis).

Trizek-WMF changed the task status from In Progress to Open.Jun 20 2024, 2:28 PM
Trizek-WMF lowered the priority of this task from Medium to Low.

I'm moving this project from "active work" to "passive vigilance": the projects listed under sub-tasks are underway but without any specific commitments.