For an upcoming project I need to be able to dynamically get information about a GitHub repository such as the number of stars, watchers, forks and the repo description and url.

Looking at the API I didn’t see a simple way of doing it so I decided to scrape my repo instead.

Using HTML Dom Parser (http://simplehtmldom.sourceforge.net) the process is simple. First include simple_html_dom.php then setup the url to my repo:

$html = file_get_html('https://github.com/simple-mvc-framework/framework');

Next I need to get the watchers, stars and forks, each are contained within an a link with a class of social-count, that’s perfect I can use the class to get all links with that class:

$html->find('a.social-count', 0)->innertext;

The number represents the index I could loop through the results using a foreach but I wanted to be specific and add them to an array like this:

$info = array(
    'watching' => trim($html->find('a.social-count', 0)->innertext),
    'starred' => trim($html->find('a.social-count', 1)->innertext),
    'forked' => trim($html->find('a.social-count', 2)->innertext)

I’ve wrapped the results around trim to remove any spacing.

That’s the stats taken care of, next is the repo description, that is stored in a div with a class of ‘repository-description’:

$html->find('div.repository-description', 0)->innertext;

Finally the repo url, it’s inside a div with a class of ‘repository-website’:

strip_tags($html->find('div.repository-website', 0)->innertext)

This time I want to remove the a link using strip_tags that will remove all markup.

Putting this all together:

$info = array(
    'watching' => trim($html->find('a.social-count', 0)->innertext),
    'starred' => trim($html->find('a.social-count', 1)->innertext),
    'forked' => trim($html->find('a.social-count', 2)->innertext),
    'desc' => trim($html->find('div.repository-description', 0)->innertext),
    'sitelink' => trim(strip_tags($html->find('div.repository-website', 0)->innertext))

Now anytime I want to display one of these I can call the relevent part such as $info[’starred’].


Now I have the stats it would be nice to display recent commits say the most recent 5.

This time I call a different url. The commits are stored in series of li’s with a class of commit. This time looping through them.

Storing the commit and title is variabled and then using str_replace to make sure the url on the a links are pointing to github.

I only want to so a check is ran once the $i is equal to 5 break the loop.

$i = 0;
$html = file_get_html('https://github.com/simple-mvc-framework/framework/commits/master');
foreach($html->find('li.commit') as $e){
    $comit = $e->find('div.commit-meta', 0)->innertext.'<br>';
    $title = $e->find('p.commit-title', 0)->innertext.'<br>';
    echo '<p>';
    echo str_replace('href="/', 'href="https://github.com/', $title);
    echo str_replace('href="/', 'href="https://github.com/', $comit);
    echo '</p>';

    if ($i == 5) {


It would have been nice to use an official API but it would have meant multiple calls for the information. Scrapping the information is much quicker and easier in this case.