Tutorials, PHP & MySQL, Laravel Framework

I recently had to import a 30,000 line CSV file, in this post, I document how I tackled this task.

large files can be read fairly easily with PHP the issue is when you start inserting the data into a database, even worse if you're first checking if the record already exists. In that case, say you're working with a 100 lined CSV file your end up doing 200 queries. You can't avoid needing to do the queries but doing lots in one go will use up your application's memory and will timeout when taking too long.

It's easier to break up a large file into smaller files and process them in batches.

Let's go through the steps achieved.

First 2 routes are required one for loading an import view and another to process the import.

Route::get('import',  'ContactsController@import');
Route::post('import', 'ContactsController@parseImport');

I created a ContactsController

php artisan make:controller ContactsController

In the controller create an import method to load a view called import.

public function import()
{
    return view('import');
}

Make a view called import.blade.php

This view will have a form set to post to /import url as it will be working with a file enctype="multipart/form-data" is needed to upload the file.

Wrap the file input inside of a bootstrap element not required but useful for styling errors.

<p>{{ session('status') }}</p>

<form method="POST" action="{{ url("import") }}" enctype="multipart/form-data">
{{ csrf_field() }}

<div class="form-group{{ $errors->has('file') ? ' has-error' : '' }}">
    <label for="file" class="control-label">CSV file to import</label>
    
    <input id="file" type="file" class="form-control" name="file" required>

    @if ($errors->has('file'))
        <span class="help-block">
        <strong>{{ $errors->first('file') }}</strong>
        </span>
    @endif
    
</div>
 
<p><button type="submit" class="btn btn-success" name="submit"><i class="fa fa-check"></i> Submit</button></p>

</form>

Next in the ContactsController create a method called parseImport:

Place validation rules, this ensures only a CSV file will be accepted. If there is an error the user will be redirected back to the import page.

public function parseImport()
{
    request()->validate([
        'file' => 'required|mimes:csv,txt'
    ]);
}

After the validation store a path to the file

//get file from upload
$path = request()->file('file')->getRealPath();

Read the file contents into an array

//turn into array
$file = file($path);

Remove the first line as this is a header line containing the column titles.

//remove first line
$data = array_slice($file, 1);

Now using array_chunk spit the CSV file into parts every 1000 lines the loop over each part and create a new CSV file containing the part we've just extracted.

//loop through file and split every 1000 lines
$parts = (array_chunk($data, 1000));
$i = 1;
foreach($parts as $line) {
    $filename = base_path('resources/pendingcontacts/'.date('y-m-d-H-i-s').$i.'.csv');
    file_put_contents($filename, $line);
    $i++;
}

NOTE - for this work you will need to create a folder inside the resources folder called pendingcontacts. This is where the new CSV files will be stored before they are processed.

The full method looks like this:

public function parseImport()
{
    request()->validate([
        'file' => 'required|mimes:csv,txt'
    ]);

    //get file from upload
    $path = request()->file('file')->getRealPath();

    //turn into array
    $file = file($path);

    //remove first line
    $data = array_slice($file, 1);

    //loop through file and split every 1000 lines
    $parts = (array_chunk($data, 1000));
    $i = 1;
    foreach($parts as $line) {
        $filename = base_path('resources/pendingcontacts/'.date('y-m-d-H-i-s').$i.'.csv');
        file_put_contents($filename, $line);
        $i++;
    }

    session()->flash('status', 'queued for importing');

    return redirect("import");
}

At this point a CSV file has been broken into smaller CSV file but that's it. In order to import the contents of these files a table is needed. 

Make a new migration:

php artisan make:migration create_contacts_table

Open the newly created migration and add your table columns to keep it simple I'm going to have an id, email and timestamp columns

<?php

use Illuminate\Support\Facades\Schema;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Database\Migrations\Migration;

class CreateContactsTable extends Migration
{
    /**
     * Run the migrations.
     *
     * @return void
     */
    public function up()
    {
        Schema::create('contacts', function (Blueprint $table) {
            $table->increments('id');
            $table->string('email');
            $table->timestamps();
        });
    }

    /**
     * Reverse the migrations.
     *
     * @return void
     */
    public function down()
    {
        Schema::dropIfExists('contacts');
    }
}

Next run your migrations

php artisan migrate

Now create a model:

php artisan make:model Contact

I'll make email fillable in the model:

<?php

namespace App;

use Illuminate\Database\Eloquent\Model;

class Contact extends Model
{
    public $fillable = ['email'];
}

Now we have a table to store the CSV records in, let's make a command that Artisan can use to import the records.

php artisan make:command ImportContacts

This will make a new class inside app/Console/Commands

Import the contact class

<?php

namespace App\Console\Commands;

use Illuminate\Console\Command;
use App\Contact;

class ImportContacts extends Command

Next add your signature and description

/**
 * The name and signature of the console command.
 *
 * @var string
 */
protected $signature = 'import:contacts';

/**
 * The console command description.
 *
 * @var string
 */
protected $description = 'import contacts from stored csv files';

The signature will become an artisan command.

In this case, we don't need to touch the construct method, move to the handle method. This will run when the command is executed

This opens the files inside the pendingcontacts folder we use array_slice(globa($path),0,2) to only open 2 files at a time. Then extract a line into an array then using Laravel's updateOrCreate method to update a record if the email matches otherwise import a new contact.

public function handle()
{
    //set the path for the csv files
    $path = base_path("resources/pendingcontacts/*.csv"); 
    
    //run 2 loops at a time 
    foreach (array_slice(glob($path),0,2) as $file) {
        
        //read the data into an array
        $data = array_map('str_getcsv', file($file));

        //loop over the data
        foreach($data as $row) {

            //insert the record or update if the email already exists
            Contact::updateOrCreate([
                'email' => $row[6],
            ], ['email' => $row[6]]); 
        }

        //delete the file
        unlink($file);
    }
}

the full class looks like this:

<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;

class ContactsController extends Controller
{
    public function import()
    {
    	$records = [];
        $path = base_path('resources/pendingcontacts');
        foreach (glob($path.'/*.csv') as $file) {
            $file = new \SplFileObject($file, 'r');
            $file->seek(PHP_INT_MAX);
            $records[] = $file->key();
        }
        $toImport = array_sum($records);

        return view('import', compact('toImport'));
    }

    public function parseImport()
    {
        request()->validate([
            'file' => 'required|mimes:csv,txt'
        ]);

        //get file from upload
        $path = request()->file('file')->getRealPath();

        //turn into array
        $file = file($path);

        //remove first line
        $data = array_slice($file, 1);

        //loop through file and split every 1000 lines
        $parts = (array_chunk($data, 1000));
        $i = 1;
        foreach($parts as $line) {
            $filename = base_path('resources/pendingcontacts/'.date('y-m-d-H-i-s').$i.'.csv');
            file_put_contents($filename, $line);
            $i++;
        }

        session()->flash('status', 'queued for importing');

        return redirect("import");
    }
}

Now in order to run this command, we need to tell Artisan where to find it. We do this by registering it in a service provider I'll use the default AppServiceProvider.php.

Inside a boot method:

$this->commands([
    \App\Console\Commands\ImportContacts::class
]);

Optionally you may want to use the scheduler to run this command at a given frequency. To do this create an instance of the schedule class then run the command pass in the signature and frequency.
In this example the import:contacts command will be called every minute.

NOTE - the Schedule will need importing first.

use Illuminate\Console\Scheduling\Schedule;
$this->app->booted(function () {
    $schedule = $this->app->make(Schedule::class);
    $schedule->command('import:contacts')->everyMinute();
});

The full class looks like this:

<?php

namespace App\Providers;

use Illuminate\Support\ServiceProvider;
use Illuminate\Console\Scheduling\Schedule;

class AppServiceProvider extends ServiceProvider
{
    /**
     * Bootstrap any application services.
     *
     * @return void
     */
    public function boot()
    {
        $this->commands([
            \App\Console\Commands\ImportContacts::class
        ]);

        $this->app->booted(function () {
            $schedule = $this->app->make(Schedule::class);
            $schedule->command('import:contacts')->everyMinute();
        });
    }

    /**
     * Register any application services.
     *
     * @return void
     */
    public function register()
    {
        //
    }
}

That's it! now everytime you import a CSV file it will be broken into smaller CSV files added to the pendingcontacts folder and your command will run every minute processing 2 files at a time (you can increase the number).

It would be useful to inform the user how many contacts are left to be imported. This can be done in the import method of the controller:

<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;

class ContactsController extends Controller
{
    public function import()
    {
    	//setup an empty array
    	$records = [];

    	//path where the csv files are stored
        $path = base_path('resources/pendingcontacts');

        //loop over each file
        foreach (glob($path.'/*.csv') as $file) {

        	//open the file and add the total number of lines to the records array
            $file = new \SplFileObject($file, 'r');
            $file->seek(PHP_INT_MAX);
            $records[] = $file->key();
        }

        //now sum all the array keys together to get the total
        $toImport = array_sum($records);

        return view('import', compact('toImport'));
    }

Pass $toImport to the view then in the view show the number as long as there is at least 1:

@if ($toImport > 0)
    <p>Contacts left to import: {{ $toImport }}</p>
@endif