In the vast expanse of the internet, websites are mere islands in an ocean of code. And for the curious hacker, GitHub’s endless repositories hold the treasure maps to these islands. But what if these maps also contained hidden secrets? Keys to hidden doors, forgotten passages, and unguarded treasures? Today, we’re diving into the depths of GitHub to unearth these secrets, revealing how even the most secure websites might have vulnerabilities etched into their very blueprint, accessible for those who know where to look.
The Treasure Hunt Begins: Understanding GitHub
GitHub, the world’s largest host of source code, is a hacker’s haven for reconnaissance. It’s where developers store the heart of websitesβsource code, configuration files, and scripts. But sometimes, they inadvertently check in sensitive data: API keys, passwords, and secret tokens that are meant to be kept out of public view.
Tools of the Trade: GitHub Dorking
The hunt for secrets begins with GitHub Dorkingβusing search queries to find sensitive information exposed in GitHub repositories. Similar to Google Dorking, it leverages GitHub’s search engine to sift through public code with precision.
Here’s how to start:
Search for High-Value Targets: Keywords like password
, secret
, key
, token
, can yield interesting results. Combine them with specific file names like config.yml
, .env
, or database.ini
for more focused hunting.
Let’s say, for instance, we want to find AWS credentials, which are often unintentionally committed to public repositories. AWS credentials are used to authenticate and authorize calls to AWS services. They typically consist of an Access Key ID and Secret Access Key. Because of their power and the level of access they can grant, finding exposed AWS credentials can lead to significant security vulnerabilities.
Here’s how you might structure your search queries to find AWS credentials:
Query: AWS_ACCESS_KEY_ID=AKIA[0-9A-Z]{16}
Searches for AWS Access Key IDs, which start with "AKIA" followed by 16 alphanumeric characters.
Query: aws_secret_access_key
Looks for instances of the string "aws_secret_access_key", commonly found in AWS SDK configurations.
Query: filename:.properties aws_access_key_id
Targets .properties files containing "aws_access_key_id", often used in Java applications.
Explore Commit History: Sometimes, secrets are removed from the current version of the code but remain in the commit history. Exploring previous commits might reveal secrets not visible in the latest codebase.
Look for Hidden Directories and Files: Directories named .git
or files like .htaccess
can sometimes be overlooked by developers when cleaning up sensitive data.
Automating the Hunt: Tools to Enhance Your Search
Several tools can automate GitHub Dorking, making the search for secrets more efficient:
GitRob: A tool designed to find sensitive files pushed to public repositories on GitHub. GitRob will scan and flag potential sensitive files based on their names.
gitrob analyze <organization>
– Analyzes all repositories belonging to a GitHub organization for sensitive files.gitrob analyze <user>
– Analyzes all repositories owned by a specific GitHub user for sensitive files.gitrob serve
– Starts the GitRob web interface to review findings from the analysis.gitrob -h
– Displays help information, listing all commands and options.gitrob version
– Displays the current version of GitRob.
TruffleHog: Digs deep into commit histories to find strings with high entropy, often indicative of sensitive keys.
trufflehog --regex --entropy=False [email protected]:user/repo.git
– Scans a repository’s git history for secrets, skipping entropy checks.trufflehog file://path/to/repo --since_commit=commit_hash
– Scans a local repository from a specific commit hash.trufflehog [email protected]:user/repo.git --json
– Outputs findings in JSON format.trufflehog --exclude_paths=regexes.json [email protected]:user/repo.git
– Excludes paths matching the regexes inregexes.json
.trufflehog --rules=custom_rules.json [email protected]:user/repo.git
– Uses custom rules defined incustom_rules.json
for scanning.
Gitleaks: Provides a way to find unencrypted secrets shipped to the repository and supports a vast array of sources to cover your scanning needs comprehensively.
gitleaks detect --source='https://github.com/user/repo.git'
– Detects secrets in the specified remote repository.gitleaks protect --staged
– Scans staged changes in your local git repository for secrets.gitleaks --path=/path/to/repo --verbose
– Scans a local repository for secrets, with verbose output.gitleaks --config=config.toml --source='https://github.com/user/repo.git'
– Uses a custom configuration file for scanning the specified repository.gitleaks --source='https://github.com/user/repo.git' --report=report.json
– Outputs findings to areport.json
file.