Project Semiotica
proprietary software using advanced, multilayered algorithms to detect possible disinformation campaigns on x.com
categorisation and cluster abalysis as well as individual post-User profiling
Detection of targeted, orchestrated information campaigns 0nline
Dynamic, self-learning, layered pipeline logic
Pipeline Self-calibration based on human assesment of data output
How it works
Pipeline Levels:
Level 0: Preprocessing
Language detection, text normalization, tokenization, lemmatization
Level 1: Keyword Filtering
Exact, fuzzy, and semantic keyword triggers with early exit capability
Level 2: Semantic Similarity
TF-IDF vectorization, cosine similarity, DBSCAN clustering, narrative mapping
Level 3: Stylometry
POS tagging, sentence length distribution, lexical diversity, readability scores
Level 4: Emotional Analysis
Sentiment analysis, rhetorical markers, discourse analysis, modality detection
Level 5: Pattern Analysis
Topic modeling, burst detection, temporal patterns, Bayesian changepoint detection
Level 6: Metadata
Engagement-weighted sentiment, account history analysis, temporal burst graphs
Dynamic Weighted Scoring System
W_i = Weight assigned to each signal (0-1 scale, sum of weights = 1).
S_i = Normalised score (0-1) for each linguistic/statistical feature.
Final score → probabilistic confidence:
>0.7 = high risk
0.4–0.7 = medium
<0.4 = low
How it works
Most content on X turn out to be benign, and is part of healthy debate and regular user activity.
Other content have origins in fake accounts on X (bought or overtaken), is computer generated, or contain information which repeat the same or similar messages over and over again, across accounts.
Our system is able to detect clusters of orchestrated, systematic disinformation using statistical tools.