You may need an iron will to face the roller-coaster-like stock prices of Twitter (now rebranded as X),

which may give you the impression that Twitter will permanently close soon. However, most people still

rely on Twitter to learn about breaking news, current events, and trends around the globe. Influencers

still employ Twitter to exhibit their food, drink, stature, and sweethearts. Humor enthusiasts are still

taking advantage of Twitter to share memes and jokes. Online socializers still use Twitter to interact with

friends, share opinions, and argue with opponents. A plethora of corporations, celebrities, and public

personalities continue to stuff Twitter full of ads for products, services, projects, and events. Recently, a

newly developed pathological analysis AI tool might give you an even more positive indication: Twitter is

still very useful, even in the scientific/medical field.

On Aug 17, 2023, Huang Z. and some colleagues from Stanford University School of Medicine published

a research paper titled “A visual–language foundation model for pathology image analysis using medical

Twitter” on Nature Medicine. They realized that tons of clinical discussions and de-identified pathology

images posted on Twitter with over 30 pathology subcategories would be a big waste if researchers did

not utilize them. Guess how many images the researchers harvested from Twitter? 243,375! Applying

strict filtering protocols, such as images and texts should be paired, images should not be retweeted,

images should be highly liked, etc. Huang et al. first meticulously cleaned curated data from Twitter. To

further expand the dataset, they combined Twitter data with another online open source (PathLAION).

They ultimately created a new database named OpenPath, which contains 208,414 image–text pairs,

including 116,504 from tweets, 59,869 from the associated replies, and 32,041 from PathLAION, as the

largest publicly available pathology image collection with annotated text description so far.

Based on the OpenPath, employing the contrastive learning strategy, the researchers then developed an

AI tool, or a robot pathologist, called pathology language–image pretraining (PLIP). Contrastive learning

allowed the model to tell “similar” or “different” by comparing positive and negative pairs of images, e.g.,

a normal lung tissue and a lung adenocarcinoma. Further fine-tuning or optimization was carried out to

facilitate the robot pathologist to tell “similar” for each paired image and text, and “dissimilar” for non-

paired images and texts. For example, all lung adenocarcinomas share certain similar pathological traits,

even if the images originated from distinct patients and laboratories with different staining manners. This

process was implemented by first developing text and image encoders, and further generating embedded

features.

Then comes the critical and exciting part: the robot pathologist needs to take some exams to evaluate how

well it functions. In the entrance exam, PLIP was exposed to new images from four external validation

datasets and was required to tell which images were normal，benign, or malignant. The images were

curated from multiple tissues and organs. Outcomes turned out that this task was a piece of cake for PLIP,

which provided correct answers to most of the questions without the need for retraining. The

effectiveness and precision of the retrievals were much better compared to another previously developed

contrastive language–image pretraining (CLIP) model. The next exam requires PLIP to locate the

characteristic cell structures, tissue compositions, and disease manifestations on complicated images,

representing specific pathological changes. This exam required a further retraining process for PLIP,

including Image embedding analysis (whose simple term is to find the most essential information from the

image) and linear probing (whose simple term is to avoid cluster). Four different datasets (Kather colon,

PanNuke, DigestPath and WSSS4LUAD) were used for this training and exam. With no surprise, PLIP

outperformed two other models.

Still, there are two more exams waiting for PLIP! The third exam was to input a text request, for example,

normal colon mucosa in colorectal H&E tissue, to test if PLIP can retrieve the correct images from different

image datasets, including Twitter, PathPedia, PubMed pathology, and pathology book. The last exam was

even more challenging: inputting an image and testing if PLIP can retrieve representative images of the

same kind. PLIP achieved the best performance on both text and image retrievals, although other models,

including CLIP, MuDiPath, and SISH5, could conduct similar tasks. Upon finishing the two final exams, PLIP

was proved to be a qualified robot pathologist installed with the digital library in its super brain. This AI

tool developed based on social media data will be valuable for disease diagnosis and classification,

education and training, rare case identification, quality control, benchmarking, etc.

Based on some forecasts, a long-term increase is expected on Twitter’s stock price. Will this scientific

discovery contribute to its increase? Definitely a tiny, tiny bit. Nevertheless, it is noteworthy that AI tools

will become more versatile and smarter as they are more commonly used. Twitter is the same.

X-Simplify

Orange.xsimplify@gmail.com