You may need an iron will to face the roller-coaster-like stock prices of Twitter (now rebranded as X),
which may give you the impression that Twitter will permanently close soon. However, most people still
rely on Twitter to learn about breaking news, current events, and trends around the globe. Influencers
still employ Twitter to exhibit their food, drink, stature, and sweethearts. Humor enthusiasts are still
taking advantage of Twitter to share memes and jokes. Online socializers still use Twitter to interact with
friends, share opinions, and argue with opponents. A plethora of corporations, celebrities, and public
personalities continue to stuff Twitter full of ads for products, services, projects, and events. Recently, a
newly developed pathological analysis AI tool might give you an even more positive indication: Twitter is
still very useful, even in the scientific/medical field.
On Aug 17, 2023, Huang Z. and some colleagues from Stanford University School of Medicine published
a research paper titled “A visual–language foundation model for pathology image analysis using medical
Twitter” on Nature Medicine. They realized that tons of clinical discussions and de-identified pathology
images posted on Twitter with over 30 pathology subcategories would be a big waste if researchers did
not utilize them. Guess how many images the researchers harvested from Twitter? 243,375! Applying
strict filtering protocols, such as images and texts should be paired, images should not be retweeted,
images should be highly liked, etc. Huang et al. first meticulously cleaned curated data from Twitter. To
further expand the dataset, they combined Twitter data with another online open source (PathLAION).
They ultimately created a new database named OpenPath, which contains 208,414 image–text pairs,
including 116,504 from tweets, 59,869 from the associated replies, and 32,041 from PathLAION, as the
largest publicly available pathology image collection with annotated text description so far.
Based on the OpenPath, employing the contrastive learning strategy, the researchers then developed an
AI tool, or a robot pathologist, called pathology language–image pretraining (PLIP). Contrastive learning
allowed the model to tell “similar” or “different” by comparing positive and negative pairs of images, e.g.,
a normal lung tissue and a lung adenocarcinoma. Further fine-tuning or optimization was carried out to
facilitate the robot pathologist to tell “similar” for each paired image and text, and “dissimilar” for non-
paired images and texts. For example, all lung adenocarcinomas share certain similar pathological traits,
even if the images originated from distinct patients and laboratories with different staining manners. This
process was implemented by first developing text and image encoders, and further generating embedded
features.
Then comes the critical and exciting part: the robot pathologist needs to take some exams to evaluate how
well it functions. In the entrance exam, PLIP was exposed to new images from four external validation
datasets and was required to tell which images were normal,benign, or malignant. The images were
curated from multiple tissues and organs. Outcomes turned out that this task was a piece of cake for PLIP,
which provided correct answers to most of the questions without the need for retraining. The
effectiveness and precision of the retrievals were much better compared to another previously developed
contrastive language–image pretraining (CLIP) model. The next exam requires PLIP to locate the
characteristic cell structures, tissue compositions, and disease manifestations on complicated images,
representing specific pathological changes. This exam required a further retraining process for PLIP,
including Image embedding analysis (whose simple term is to find the most essential information from the
image) and linear probing (whose simple term is to avoid cluster). Four different datasets (Kather colon,
PanNuke, DigestPath and WSSS4LUAD) were used for this training and exam. With no surprise, PLIP
outperformed two other models.
Still, there are two more exams waiting for PLIP! The third exam was to input a text request, for example,
normal colon mucosa in colorectal H&E tissue, to test if PLIP can retrieve the correct images from different
image datasets, including Twitter, PathPedia, PubMed pathology, and pathology book. The last exam was
even more challenging: inputting an image and testing if PLIP can retrieve representative images of the
same kind. PLIP achieved the best performance on both text and image retrievals, although other models,
including CLIP, MuDiPath, and SISH5, could conduct similar tasks. Upon finishing the two final exams, PLIP
was proved to be a qualified robot pathologist installed with the digital library in its super brain. This AI
tool developed based on social media data will be valuable for disease diagnosis and classification,
education and training, rare case identification, quality control, benchmarking, etc.
Based on some forecasts, a long-term increase is expected on Twitter’s stock price. Will this scientific
discovery contribute to its increase? Definitely a tiny, tiny bit. Nevertheless, it is noteworthy that AI tools
will become more versatile and smarter as they are more commonly used. Twitter is the same.