Google is a search engine for words or phrases on web pages placed in an index. Through Google Dorks you can use hacker techniques to crawl websites indexed in Google and thus find possible vulnerabilities in web pages. It is also often called Google hacking, hacker through Google looking for applications with security holes in their configuration and computer codes on web pages that are, unfortunately, indexed in Google.
In this article I will compile several instructions for Google these websites with low security, as well as other interesting search criteria. All these techniques are allowed since anyone can do them, although you have to use them with sense and not be proactive creating threats to vulnerable websites. It’s about working on solving problems, not creating them.
Birth of Google Dorks
It can be said that Google Dorks nation in the year 2002 by the hand of a hacker named Johnny Long. He is a professional hacker and researcher for years in computer security. He is the founder of Hackers For Charity (http://ihackcharities.org), an organization that brings work experience to hackers while leveraging their skills for charities that need those skills.
Advanced search on Google
There are multiple possibilities to be more precise when doing a google search, and part of using the advanced google search, https://www.google.com/advanced_search. There are wildcard characters and symbols such as AND, OR, NOT, and symbols such as ~ to search for synonyms, + to combine two words, “” for exact search, * for wildcard character.
There are other very useful criteria such as:
"link:avertigoland.com"
Websites that link to your website. Change the URL you want to search for.
related:NYtimes.com
Displays websites related to the one you’re looking for. For example, if it is an online newspaper, it will show other online newspapers that show news similar to the one you type as a search criterion.
weather:Vigo
Displays time information for the town or city you use as your search criteria.
cache:paginaweb.com
Displays the latest cached version of a web page saved on Google. This is a screenshot of the page as it is in a timestamp. The page may have been modified since that timestamp.
Search for website login information
Let’s see below a series of codes to search on google and the explanation of what they do. Note that if we obtain the user and password data, even if it is public data that is seen in a Google search, we do not have permission to access the machine that has those credentials. So it just has to be treated in an informative way, in the form of research, but you can’t access places that we don’t have permission to. If we still access the ideal is to browse using a VPN.
filetype:xls username password email
This search shows Microsoft Excel spreadsheets that contain the words user name, password, and email. Keep in mind that you will appear a lot of blank “template” forms that do not provide information, however, you can also find real documents full of passwords and users.
site:intext:"pass" ! "usuario" | "user" | "contraseña" filetype:sql -github
This search what it does is show the results of a search in google de websites where user or password come and for the type of SQL files (database).
intitle:"index of" inurl:/backup
Search for directories with backups.
filetype:sql "MySQL dump" (pass|password|passwd|pwd)
This displays data exported from a MySQL database where you are searching for pass|password|passwd|pwd as keywords.
How to prevent Google from indexing a website
All the codes seen above show information that is in an index created by the Google search engine. As a security measure to avoid this is to prevent google’s indexing robot from indexing the website, prohibiting it from appearing in searches. The following commands can be used as a help and target the robots file.txt (https://avertigoland.com/2021/06/search-engines-add-your-content-to-google/) of the root directory:
It prohibits Google from indexing the entire website:
User-agent: Googlebot Disallow: /
It prohibits any robot from indexing a website:
User-agent: * Disallow: /
It prohibits Googlebot from indexing specific files:
User-agent: Googlebot Disallow: /*.sql$
Prohibits Googlebot from indexing a folder on a site
User-agent: Googlebot Disallow: /nombredirectorio/
Prohibits Googlebot from indexing a particular page of the website
User-agent: Googlebot Disallow: /confidencial.html