The SEOs Diners Club - Issue #50 - Weekly SEO Tips & News

Here are the weekly SEO insights for the SEOs Diners Club members. You may also join our free SEO Diners Club network to ask questions and share your thoughts on these topics.

SEO Lessons From the Yandex Leak

Yandex is not a Google, but there is a lot that SEO experts can learn about how to build a modern search engine by studying these codes.

Mike King, the founder of the Ipullrank digital marketing agency, has published a comprehensive blog post about the Yandex code leak. I made a summary for you. You can find the link to the original article below. 

It Is Not Google's Code, So Why Do We Care?

Some believe examining this codebase is distracting and has nothing to influence their SEO-related decisions. Of course, Yandex is not Google. However, both are state-of-the-art web search engines that remain at the cutting edge of technology.

Software engineers from both companies attend the same conferences (SIGIR, ECIR, etc.) and share findings and innovations in Information Gain, Natural Language Processing/Comprehension, and Machine Learning. Yandex had a presence in Palo Alto and Google previously in Moscow.

A quick LinkedIn search reveals several hundred engineers who have worked at both companies, though we don't know how many are working on Search at either company.

In a more direct overlap, Yandex also uses open-source technologies critical to innovations in Search, such as Google's TensorFlow, BERT, MapReduce, and, much lesser extent Protocol Buffers.

So, while Yandex is definitely not Google, it's not just some random research project we're talking about here. There is a lot we can learn about how a modern search engine is built by examining this codebase.

Leaked Codes Have 17,854 Rank Factors

A deep look at the code base reveals that Yandex has a large number of ranking factor files for different subsets of query processing and ranking systems.

When we scan them, we see that there are 17,854 ranking factors in total. These ranking factors include various measures of:

  • Clicks

  • Dwell Time

  • Data obtained using Metrika, Yandex's equivalent of Google Analytics.

Yandex's Top Priority Negative Ranking Factors

In summary, these factors suggest that to get the best score, you should:

  • Avoid ads

  • Update old content instead of creating new pages.

  • Make sure that most of the backlinks to your site have branded anchor text.

Yandex's Top Priority Positive Ranking Factors

For your rankings to be positively affected, you must:

  • Play word games while creating your domain name

  • Make sure your domain is .com

  • Encourage people to search for your target keywords in Yandex Bar

  • Keep getting clicks

There Are Many Unexpected First Ranking Factors

The more interesting first-weighted ranking factors are the unexpected ones. Below is a list of seventeen factors that stand out.

FI_PAGE_RANK: +0.1828678331 — PageRank is Yandex's 17th highest weighted factor. They had previously completely removed backlinks from their ranking system, so it's not surprising that it's this low on the list.

FI_SPAM_KARMA: +0.00842682963 — The SPAM hash gets its name from “antispammers” and is the probability that the server is spam; Based on whois information.

FI_SUBQUERY_THEME_MATCH_A: +0.1786465163 — How closely the query and document match thematically. It is the 19th highest weighted factor.

FI_REG_HOST_RANK: +0.1567124399 — Yandex has a host (or domain) ranking factor.

FI_URL_LINK_PERCENT: +0.08940421124 — The ratio of links with URL (rather than text) to the total number of links.

FI_PAGE_RANK_UKR: +0.08712279101 — Has a specific Ukraine PageRank

FI_IS_NOT_RU: +0.08128946612 — It is a positive thing that the domain name is not .RU. The Russian search engine doesn't trust Russian sites :)

FI_YABAR_HOST_AVG_TIME2: +0.07417219313 — This is the average wait time reported by YandexBar

FI_LERF_LR_LOG_RELEV: +0.06059448504 — This is link relevance based on the quality of each link FI_NUM_SLASHES9417

FI_ADV_PRONOUNS_PORTION: -0.001250755075 — The ratio of pronoun names on the page.

FI_TEXT_HEAD_SYN: -0.01291908335— Presence of [query] words in the title, taking into account synonyms.

FI_PERCENT_FREQ_WORDS: -0.02021022114 — The ratio of the number of words, which are the 200 most frequently used words of the language, to the total number of words in the text.

FI_YANDEX_ADV: -0.09426121965 — More specific with the dislike for ads, Yandex penalizes pages that contain Yandex ads.

FI_AURA_DOC_LOG_SHARED: -0.09768630485 — The logarithm of the number of non-unique text fields in the document.

FI_AURA_DOC_LOG_AUTHOR: -0.09727752961 — The logarithm of the number of text fields for which this document owner is recognized as the author.

FI_CLASSIF_IS_SHOP: -0.1339319854 — Apparently, Yandex will pay less attention to you if your page is a store.

When we examine these strange ranking factors and the factors available in the Yandex codebase, we see that many things could be ranking factors.

Mike King suspects that the “200 signals” that Google reports are 200 signal classes, and each signal combines many other components. According to King, just as Google Analytics has dimensions associated with many metrics, Google Search probably has classes of ranking signals consisting of many attributes.

Chris Long — Yandex prioritizes content close to the homepage

Yandex Digs Google, Bing, YouTube, and TikTok!

The codebase also reveals that Yandex has many parsers for other websites and related services. Also, Yandex has parsers for various services as well as their own.

What Can We Add to What We Know About Google from the Yandex Leak?

Naturally, this is still the question on everyone's mind. While there are certainly many similarities between Yandex and Google, the truth is that only a Google Software Engineer working on Search can definitively answer this question.

Still, this is the wrong question.

Indeed, this code should help us expand our thinking about modern search. Much of the collective understanding of search comes from what the SEO community learned through testing in the early 2000s and from the mouths of search engineers when the search was much less opaque. Unfortunately, this hasn't kept up with the fast pace of innovation.

The insights from the Yandex leak's many features and ranking factors should yield more hypotheses that need to be tested and considered for ranking in Google. They should also offer more that can be parsed and measured by SEO crawling, link analysis, and ranking tools.

Googh After ChatGPT

How will Google respond to the post-ChatGPT agenda with artificial intelligence chat engine results? It seems to be a matter of whether to be or not to be for them.

I hosted my friend and experienced SEO and Digital Marketing expert, Orhan Kurulan, on our YouTube channel. We both shared our views on how Google can detect artificial intelligence-generated content and exchanged ideas on possible changes that Google will make after ChatGPT, which occupies the agenda very much.

CEO Sundar Pichai Confirms Google Is Working On An AI Search Feature Where Users Can “Interact Directly”

It is slowly becoming clear how Google will respond to Microsoft Bing search engine results powered by ChatGPT.

Sundar Pichai said, “In the coming weeks and months, we'll be rolling out these language models starting with LaMDA so people can interact with them directly. This will help us continue to receive feedback, test, and develop safely. These models are especially great for creating, building, and summarizing. They will become even more useful to people as they provide up-to-date and more realistic information.”

Sundar Pichai said that “Google first mentioned six years ago that it would be an AI-first company.” “We've been preparing for this moment since the beginning of last year, and in the next few months, you will see a lot from us in three major areas of opportunity; First, the big models. We have published extensively on the industry's largest, most sophisticated model, LaMDA and PoN, and the extensive work in DeepMind.”

In the Q&A session, Sundar continued, “In some cases, we'll launch more lab products, in other cases, beta features, and scale from there. We need to ensure we iterate publicly; these models will continue to improve, so the field is rapidly changing. Service costs will need to be improved.”

I think Google's job is not easy at all.

No one doubts the capabilities of Google's AI tools. But the main effect of ChatGPT is that it gives search a different interface and perspective than the classic ten blue links. Responding to this requires a paradigm shift. They are so bureaucratic in their internal operations that even the slightest change has to go through many tests and approvals. Not to mention their legal obligations and responsibilities. It is unrealistic to expect a company of such a large scale to step out of its comfort zone and respond quickly. What do you think? 

I Will Give A Presentation Titled "Producing AI-Generated Content Compatible with Google Algorithms" at Digitalzone

On Tuesday, February 7, we will discuss producing artificial intelligence-powered content compatible with Google algorithms at Digitalzone.

As the guest of the ZEO agency, I will make a presentation titled “Producing AI-Generated Content Compatible with Google Algorithms” at Digitalzone on Tuesday, February 7th. I welcome those who are interested in the subject.

Since the event is free and participation will be limited to 100 people, the registration form must be filled out from the link below 👇I recommend you be quick as the quota is limited to 100 people.

OpenAI Released A Tool To Detect AI-Written Content

Learn how the new OpenAI Text Classifier can be used as a starting point for detecting AI-generated content.

The information that ChatGPT developer OpenAI, who received the news of my Digitalzone presentation, has made the human / AI separation tool called AI Text Classifier available as of today does not reflect the truth :)

Kidding aside, OpenAI's AI Text Classifier can help detect AI-generated content, but it's not 100% accurate and can make mistakes.

When the tool was tested on a set of English texts, it could tell with 26% accuracy whether the text was written by Artificial Intelligence. However, he also stated that 9% of the human-written text was written by artificial intelligence.

You can test the tool by accessing it via the link below:

It can mislabel both AI-generated and human-written texts and can be circumvented with minor tweaks. In its current form, I think the AI ​​Text Classifier should not be the only option used when deciding whether a document was created by AI.

OpenAI Announces a $20 Monthly Subscription Service for the United States

OpenAI offers ChatGPT Plus for $20 per Month with faster response times and priority access. 

OpenAI is launching a premium ChatGPT Plus service for $20 per month with faster response times and priority access. The company values ​​its free users and will continue to offer free access to ChatGPT.

They are actively exploring options for cheaper plans, business plans, and data packages to make their services more accessible.

In an announcement, the company said it would use what ChatGPT learned during the research preview to continue improving the chatbot:

“We launched ChatGPT as a research preview to learn more about the strengths and weaknesses of the system and gather user feedback to help us improve its limitations. Since then, millions of people have given us feedback, we've made a few key updates, and we've seen users find value in a range of professional use cases, including creating and editing content, brainstorming ideas, programming assistance, and learning new topics.”

Unfortunately, this subscription model is currently limited to US-based users only. Otherwise, I would like to be one of the first users because it is said to run much faster. Would you consider using it?

Report: Microsoft Bing's ChatGPT Feature Will Be Faster and Richer With GPT-4

We're starting to learn more about Microsoft's plans to include ChatGPT in Bing Search. 

The report states: “The most interesting improvement in the latest version announced by sources is the speed of the GPT-4. ChatGPT can take some time to respond — sometimes minutes, in my experience for most people —.”

ChatGPT, currently running with GPT-3.5, is slow and can take minutes to produce results. The report states that the responses generated through GPT-4 are “more human and more detailed.”

GPT-4. GPT-4, short for Generative Pre-trained Transformer 4, is a neural network created by OpenAI. It is the next version of GPT-3.5 currently used by ChatGPT.

I claimed on my gptpromtchat.com blog that a user is testing the ChatGPT-powered Microsoft Bing interface. 

New Updates to GA4 Search Bar

As the end of Universal Analytics is approaching, updates have started coming from Google that will make GA4 more useful. 

Google has released three new updates for its GA4 dashboard, allowing advertisers to find information about available properties or accounts. 

Google has released the following updates to the Analytics Help documentation. 

Google Search Console Tests "Content Ideas" Module

Google Search Console is testing the experimental "content ideas" module.

It did not come across to me in the dozens of Search Console accounts I manage, but American SEO experts in particular shared about the "content ideas" feature that Google has experimentally tested.

Glad to see Google roll out such valuable modules. I hope the test will be successful and we will all start using this practical module.

Semrush Changes Calculation of Domain Authority Score Metric

I like the method Semrush uses for the new domain authority score calculation.

Semrush shows the domain authority score of a website we queried in the Backlink Analytics section and the historical change trend of the score.

Renewing the domain authority score chart, Semrush updated the domain authority score of all domains in the database. I am happy that many of our customers' scores have improved.

With this update, the domain authority score is now calculated according to 3 main factors;

  • Link Power It is calculated by the authority strength of the domains that give backlinks to a website and the number of these domains.

  • Organic Traffic is derived from domain analytics data and is estimated based on a website's total keyword positions and click-through potential for each position.

  • SPAM Factors (SPAM Factors) shows the results of a series of checks for suspicious SEO behavior and checks whether the acquired backlinks are manipulated (paid, etc.).

You can learn the details in the following Semrush blog post:

Google Search Console Gets Video Indexing Report, Video Impressions Tier, and Sitemap Filter

Google has updated the video indexing report to help your videos perform better in search.

The video indexing report shows how many pages indexed on your site contain one or more videos and how many of those pages a video can be indexed. Google says the report can help you understand how your videos are performing on Google and identify potential areas for improvement. 

The report now shows you the number of daily video impressions over time. Google said that “impressions are aggregated by page, meaning that if the same page appears multiple times on a single search results page (or in a single Discover session), they treat each view as an impression.”

Book Of The Week: "The Runaway Species: How Human Creativity Remakes the World" - David Eagleman

David Eagleman guides us on how we can become more creative people. 

Man creates new versions of the world every day. Our ability to innovate is unmatched among living things. Cows don't choreograph; squirrels don't build elevators to reach treetops; crocodiles don't design speedboats. On the other hand, we can absorb our experiences and derive "what ifs" from them, thanks to an evolutionary fine-tuning that took place thousands of years ago.

Renowned neuroscientist David Eagleman and composer Anthony Brandt pursue the question: What lies at the root of humanity's ability and drive to create? How does the creative software in our minds work, why do we have it, and where is it taking us? The Creative Genre is an impressive journey from Picasso to concept cars, from umbrellas to a trip to the Moon, from our education system to ketchup bottles, and scrutinizing the creative mind. Leveraging the latest findings in neuroscience, this is the first time we've made the fundamental workings of this fabulous, mysterious, and most important human skill we have visible. It opens the door to a more creative future for all of us.

It was one of those books that I didn't want to end. So I recommend you to read it too. 

I hope you enjoyed my weekly SEO insights. Hope to see you the following Sunday in the new episode. I wish you all a great week.

Best,

Mert Erkal

How Can I Help You?