I have several projects that use Solr as the search engine, with custom schemas to suit the data being stored. This means that when I test the project code, it needs to be tested against a Solr instance using the correct schema.

I also like to use GitHub Actions to run tests automatically when I push to the repository or create a pull request. GitHub Actions lets you create service containers so that your tests can conveniently use tools like Solr without a lot of setup. Unfortunately, it's not straightforward to change the Solr schema that a Solr service container uses.

It took me a while to figure out how to combine all these tools at once, so here's how I did it.

Initial setup

The project I wanted to test was a CKAN extension. I'll be basing my examples in this blog post on this. Here's the initial test.yml that I started with, which didn't use a custom Solr schema, and here's the working test.yml I ended up with. Both of them are for a repository called ckanext-example.

The code snippets that follow are simplified a bit for clarity.

The test.yml that I started out with defines a CKAN container and two service containers, Solr and Postgres, all versioned according to the CKAN version we're testing against. The CKAN container is where the test job will be run.

name: Tests
on: [push, pull_request]
jobs:
  test:
    strategy:
      matrix:
        ckan-version: ["2.10", 2.9]
      fail-fast: false

    name: CKAN ${{ matrix.ckan-version }}
    runs-on: ubuntu-latest
    container:
      image: openknowledge/ckan-dev:${{ matrix.ckan-version }}
    services:
      solr:
        image: ckan/ckan-solr:${{ matrix.ckan-version }}
      postgres:
        image: ckan/ckan-postgres-dev:${{ matrix.ckan-version }}
        env:
          POSTGRES_USER: postgres
          POSTGRES_PASSWORD: postgres
          POSTGRES_DB: postgres
        options: --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5

Here's what the Docker environment on the runner looks like:

"The GitHub Actions runner runs three Docker containers, all within the Docker local network: the CKAN container that runs the tests, the Solr service container and the Postgres service container."
CC BY Rae Knowler 2023

My first idea was to mount the config files in my project repository as volumes for the Solr service container, like I would do locally:

    services:
      solr:
        image: ckan/ckan-solr:${{ matrix.ckan-version }}
        env:
          SOLR_CONFIG_CKAN_DIR: /opt/solr/server/solr/configsets/ckan/conf
          SOLR_HEAP: 1024m
        volumes:
          - ./solr/schema.xml:${SOLR_CONFIG_CKAN_DIR}/managed-schema
          - ./solr/german_dictionary.txt:${SOLR_CONFIG_CKAN_DIR}/german_dictionary.txt
          - ./solr/solrconfig.xml:${SOLR_CONFIG_CKAN_DIR}/solrconfig.xml

This didn't work. The service containers are all started up before the steps of the GitHub Actions job are run, including checking out the code of the repository. That means that the files I wanted to mount as volumes didn't yet exist when the service container was started up.

Create your own Solr container

I realised I had to create the Solr container as part of the job's steps, instead of using a service container.

name: Tests
on: [push, pull_request]
jobs:
  test:
    ...
    env:
      WORKDIR: /__w/ckanext-example/ckanext-example
      SOLR_CONFIG_CKAN_DIR: /opt/solr/server/solr/configsets/ckan/conf
    ...
    steps:
    - uses: actions/checkout@v3
    - name: Create solr container
      run: |
        /usr/bin/docker create --name test_solr --network ${{ job.container.network }} --network-alias solr \
          --workdir $WORKDIR --publish 8983:8983 \
          -e "SOLR_HEAP=1024m" -e "SOLR_SCHEMA_FILE=$SOLR_CONFIG_CKAN_DIR/managed-schema" \
          -e GITHUB_ACTIONS=true -e CI=true -v "${{ github.workspace }}/solr/schema.xml":"$SOLR_CONFIG_CKAN_DIR/managed-schema" \
          -v "${{ github.workspace }}/solr/german_dictionary.txt":"$SOLR_CONFIG_CKAN_DIR/german_dictionary.txt" \
          -v "${{ github.workspace }}/solr/solrconfig.xml":"$SOLR_CONFIG_CKAN_DIR/solrconfig.xml" \
          ckan/ckan-solr:${{ matrix.ckan-version }}
        docker start test_solr

The first step above checks out the code, of course.

The second step creates and starts the Solr container. Most of the arguments passed to the docker create command are the same as would be given when the GitHub Actions runner creates the service container normally. I've added the new volumes and also defined a couple of env variables to make things simpler:

  1. WORKDIR: /__w/ckanext-example/ckanext-example
    We'll need to refer to this workdir when creating the Solr container and when running commands on the CKAN container. For convenience, I've defined it here as an env variable. Replace ckanext-example with the name of your repository.
  2. SOLR_CONFIG_CKAN_DIR: /opt/solr/server/solr/configsets/ckan/conf
    As this long path is used several times in the arguments to docker create, I created an env variable to use as a shortcut. Be sure to replace ckan in this path with the name of the core you're using.
  3. -v "${{ github.workspace }}/solr/schema.xml":"$SOLR_CONFIG_CKAN_DIR/managed-schema"
    Here's where you mount your custom Solr schema as a volume. You can see that I mounted several additional files as volumes too, because they were needed for my setup. All my Solr config files are in a directory called solr/ in my repository.

Create your own container to run the tests in

If we're creating and running a Solr container as part of the job steps, we can't run the job itself in a container. That would give us a system like this:

"The GitHub Actions runner runs two Docker containers within the Docker local network: the CKAN container that runs the tests and the Postgres service container. The Solr container is run by the CKAN container and is not within the same local network."
CC BY Rae Knowler 2023

Instead, we add another step to the job, to create and run the container that the tests will run in. This time, we are calling docker create exactly the same way that the Github Actions runner would do.

    steps:
    ...
    - name: Create ckan container
      run: |
        /usr/bin/docker create --name test_ckan --network ${{ job.container.network }} --network-alias ckan \
          -e "HOME=/github/home" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" \
          -v "/home/runner/work":"/__w" -v "/home/runner/work/_temp":"/__w/_temp" \
          -v "/home/runner/work/_actions":"/__w/_actions" -v "/opt/hostedtoolcache":"/__t" \
          -v "/home/runner/work/_temp/_github_home":"/github/home" \
          -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" \
          --entrypoint "tail" openknowledge/ckan-dev:${{ matrix.ckan-version }} "-f" "/dev/null"
        docker start test_ckan

The Postgres service container can remain as it is. It's part of the Docker local network and can communicate with both the CKAN and Solr containers.

Run your tests with docker exec

Lastly, any further steps in the job (like actually running the tests) have to be updated so that they're called in the new CKAN container, using docker exec. Notice that we're using the $WORKDIR env variable to specify the paths to the setup script and the tests.

    - name: Install requirements and set up ckanext
      run: |
        docker exec test_ckan $WORKDIR/bin/install_test_requirements.sh ${{ matrix.ckan-version }}
    - name: Run tests
      run: |
        docker exec test_ckan pytest --ckan-ini=$WORKDIR/test.ini \
        --disable-warnings $WORKDIR/ckanext/example/tests

Conclusion

GitHub Actions service containers are so convenient to set up and use that I was surprised there was no way to use a custom Solr schema for testing, something I need to do in most of my projects. I hope seeing how I made this work will be helpful for your future testing!