Spacewalk: Failed channel syncs because of duplicate packages

A brief summary:
I’m working with Spacewalk for three years now, from version 2.4 on. We manage about 350 systems, 30+ channels, 50+ repositories and 105.000+ packages. We import new packages and errata on a daily basis, roll out critical patches to stage systems automatically and do weekly/monthly patchdays on productive systems exclusively via Spacewalk.

Lately the periodic syncs of our CentOS 6 x86_64 channel failed on every run. New packages stayed in the queue and could not be rolled out to our systems. The Taskomatic daemon sent out error notification emails to the configured administrator mailbox in which it basically tells you that the repo-sync-bunch failed and in which logfile you can read more about this issue.

The error:13:37:56 ERROR: (23, 'ERROR: duplicate key value violates unique constraint "rhn_cnp_cid_nid_uq"', 'Could not update database entry.')
13:37:56 Repo URL: http://mirror.centos.org/centos/6/extras/x86_64/

Traceback (most recent call last):

File "/usr/bin/spacewalk-repo-sync", line 257, in <module>

[...]

spacewalk.server.rhnSQL.sql_base.SQLSchemaError: (99999, 'ERROR: current transaction is aborted, commands ignored until end of transaction block', '', InternalError('current transaction is aborted, commands ignored until end of transaction block\n',))

2018-07-12 11:30:47,557 [DefaultQuartzScheduler_Worker-10] ERROR com.redhat.rhn.taskomatic.task.RepoSyncTask - Executing a task threw an exception: org.quartz.JobExecutionException
2018-07-12 11:30:47,557 [DefaultQuartzScheduler_Worker-10] ERROR com.redhat.rhn.taskomatic.task.RepoSyncTask - Message: Command '[/usr/bin/spacewalk-repo-sync, --channel, centos6_x86_64, --type, yum]' exited with error code 1
2018-07-12 11:30:47,558 [DefaultQuartzScheduler_Worker-10] ERROR com.redhat.rhn.taskomatic.task.RepoSyncTask - Cause: null
2018-07-12 11:30:47,558 [DefaultQuartzScheduler_Worker-10] ERROR com.redhat.rhn.taskomatic.task.RepoSyncTask - Stack trace:org.quartz.JobExecutionException: Command '[/usr/bin/spacewalk-repo-sync, --channel, centos6_x86_64, --type, yum]' exited with error code 1
[...]

Okay so what do we have here? Obviously it’s something about the database and a duplicate primary key. We also have the repository that caused the error to happen. No hint which entry is the duplicate whatsoever. But we do have the failed command so let’s try that on our own and strace the process and it’s forks:[...]
23319 sendto(5, "Q\0\0\0?select LOOKUP_PACKAGE_NEVRA"..., 64, MSG_NOSIGNAL, NULL, 0) = 64

23319 poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])

23319 recvfrom(5, "T\0\0\0\33\0\1id\0\0\0\0\0\0\0\0\0\6\244\377\377\377\377\377\377\0\0D\0\0\0"..., 16384, 0, NULL, NULL) = 65

23319 sendto(5, "Q\0\0\0\210select * from rhnPackage wh"..., 137, MSG_NOSIGNAL, NULL, 0) = 137

23319 poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])

23319 recvfrom(5, "T\0\0\3#\0\34id\0\0\0\240\4\0\1\0\0\6\244\377\377\377\377\377\377\0\0org_"..., 16384, 0, NULL, NULL) = 1346

23319 sendto(5, "Q\0\0\0\210select * from rhnPackage wh"..., 137, MSG_NOSIGNAL, NULL, 0) = 137

23319 poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])

23319 recvfrom(5, "T\0\0\3#\0\34id\0\0\0\240\4\0\1\0\0\6\244\377\377\377\377\377\377\0\0org_"..., 16384, 0, NULL, NULL) = 1682

23319 sendto(5, "Q\0\0\0\211select * from rhnPackage wh"..., 138, MSG_NOSIGNAL, NULL, 0) = 138

23319 poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])

23319 recvfrom(5, "T\0\0\3#\0\34id\0\0\0\240\4\0\1\0\0\6\244\377\377\377\377\377\377\0\0org_"..., 16384, 0, NULL, NULL) = 1478

23319 sendto(5, "Q\0\0\0l\n select channel"..., 109, MSG_NOSIGNAL, NULL, 0) = 109

23319 poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])

23319 recvfrom(5, "T\0\0\0#\0\1channel_id\0\0\0\242\255\0\1\0\0\6\244\377\377\377\377"..., 16384, 0, NULL, NULL) = 56

23319 sendto(5, "Q\0\0\0l\n select channel"..., 109, MSG_NOSIGNAL, NULL, 0) = 109

23319 poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])

23319 recvfrom(5, "T\0\0\0#\0\1channel_id\0\0\0\242\255\0\1\0\0\6\244\377\377\377\377"..., 16384, 0, NULL, NULL) = 56

23319 sendto(5, "Q\0\0\0l\n select channel"..., 109, MSG_NOSIGNAL, NULL, 0) = 109

23319 poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])

23319 recvfrom(5, "T\0\0\0#\0\1channel_id\0\0\0\242\255\0\1\0\0\6\244\377\377\377\377"..., 16384, 0, NULL, NULL) = 56

23319 sendto(5, "Q\0\0\1W\n select pac"..., 344, MSG_NOSIGNAL, NULL, 0) = 344

23319 poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])

23319 recvfrom(5, "T\0\0\0#\0\1package_id\0\0\0\242\255\0\2\0\0\6\244\377\377\377\377"..., 16384, 0, NULL, NULL) = 56

Not very helpful in my opinion as there is no clue on what the package_id is that caused the error. It only confirms what we already know. But when it’s something in correlation to the database we should be able to get more information by logging all the queries done during the sync process, right? After enabling log_destination = 'stderr' and log_directory = 'pg_log' in postgresql.conf (assuming your Spacewalk installation uses a pgSQL DB), I restarted the sync and watched the PostgreSQL log closely:
2018-07-12 13:11:09.736 CEST LOG: statement: select LOOKUP_PACKAGE_NAME('centos-release') id from dual
2018-07-12 13:11:09.737 CEST LOG: statement: select LOOKUP_PACKAGE_NAME('python-setuptools') id from dual

2018-07-12 13:11:09.737 CEST LOG: statement: select LOOKUP_PACKAGE_NAME('sysstat') id from dual

2018-07-12 13:11:09.738 CEST LOG: statement: select LOOKUP_EVR(NULL, '9.0.4', '33.el6_9.1') id from dual

2018-07-12 13:11:09.753 CEST LOG: statement: select LOOKUP_EVR(NULL, '6', '10.el6.centos.12.3') id from dual

2018-07-12 13:11:09.767 CEST LOG: statement: select LOOKUP_EVR(NULL, '0.6.10', '4.el6_9') id from dual

2018-07-12 13:11:09.794 CEST LOG: statement: BEGIN

2018-07-12 13:11:09.795 CEST LOG: statement: lock table rhnChecksum in exclusive mode

2018-07-12 13:11:09.797 CEST LOG: statement: select lookup_checksum_fast('sha256', '3cdb75f0d06fe1d39cee6f010a2a487d1b2ba595fa3197bbfac9e9050b2c233f') id from dual

2018-07-12 13:11:09.894 CEST LOG: statement: select lookup_checksum_fast('sha256', '65fa114e7d947143580f98a5feeb652ceb7ebdd8b27fba65ba0b383b250c63d3') id from dual

2018-07-12 13:11:09.922 CEST LOG: statement: select lookup_checksum_fast('sha256', 'e0c9c45557bc787d85417b1daa3799ae61bab899e6475b793cf3488309f94304') id from dual

2018-07-12 13:11:09.954 CEST LOG: statement: COMMIT

2018-07-12 13:11:09.955 CEST LOG: statement: select LOOKUP_PACKAGE_NEVRA(603, 28320, 120) id from dual

2018-07-12 13:11:09.959 CEST LOG: statement: select LOOKUP_PACKAGE_NEVRA(5339, 28321, 100) id from dual

2018-07-12 13:11:09.973 CEST LOG: statement: select LOOKUP_PACKAGE_NEVRA(5913, 28322, 120) id from dual

2018-07-12 13:11:09.974 CEST LOG: statement: select * from rhnPackage where org_id = 1 and name_id = 603 and evr_id = 28320 and package_arch_id = 120 and checksum_id = 10000270

2018-07-12 13:11:09.987 CEST LOG: statement: select * from rhnPackage where org_id = 1 and name_id = 5339 and evr_id = 28321 and package_arch_id = 100 and checksum_id = 9998526

2018-07-12 13:11:09.989 CEST LOG: statement: select * from rhnPackage where org_id = 1 and name_id = 5913 and evr_id = 28322 and package_arch_id = 120 and checksum_id = 10000274

2018-07-12 13:11:09.990 CEST LOG: statement:

select channel_id

from rhnChannelPackage

where package_id = 121940

2018-07-12 13:11:09.991 CEST LOG: statement:

select channel_id

from rhnChannelPackage

where package_id = 121941

2018-07-12 13:11:09.992 CEST LOG: statement:

select channel_id

from rhnChannelPackage

where package_id = 121942

2018-07-12 13:11:09.992 CEST LOG: statement:

select package_id

from rhnChannelPackage cp

join rhnPackage p

on p.id = cp.package_id

join rhnChannel c

on c.id = cp.channel_id

where cp.channel_id = 105

and c.org_id != p.org_id

2018-07-12 13:11:10.206 CEST LOG: statement: insert into rhnChannelPackage (channel_id, package_id) values (105, 121940)

2018-07-12 13:11:10.230 CEST LOG: statement: insert into rhnChannelPackage (channel_id, package_id) values (105, 121941)

2018-07-12 13:11:10.230 CEST LOG: statement: insert into rhnChannelPackage (channel_id, package_id) values (105, 121942)

2018-07-12 13:11:10.231 CEST LOG: statement: SELECT rhn_channel.refresh_newest_package(105, 'server.app.yumreposync', 603)

2018-07-12 13:11:10.406 CEST LOG: statement: SELECT rhn_channel.refresh_newest_package(105, 'server.app.yumreposync', 5339)

2018-07-12 13:11:10.459 CEST ERROR: duplicate key value violates unique constraint "rhn_cnp_cid_nid_uq"

2018-07-12 13:11:10.459 CEST DETAIL: Key (channel_id, name_id, package_arch_id)=(105, 5339, 100) already exists.

2018-07-12 13:11:10.459 CEST CONTEXT: SQL statement "insert into rhnChannelNewestPackage

(channel_id, name_id, evr_id, package_id, package_arch_id)

(select channel_id,

name_id, evr_id,

package_id, package_arch_id

from rhnChannelNewestPackageView

where channel_id = channel_id_in

and (package_name_id_in is null

or name_id = package_name_id_in)

)"

PL/pgSQL function rhn_channel.refresh_newest_package(numeric,character varying,numeric) line 9 at SQL statement

2018-07-12 13:11:10.459 CEST STATEMENT: SELECT rhn_channel.refresh_newest_package(105, 'server.app.yumreposync', 5339)

2018-07-12 13:11:10.462 CEST LOG: statement:

select k1.description as ca_cert_name, k1.key as ca_cert, k1.org_id as ca_cert_org,

k2.description as client_cert_name, k2.key as client_cert, k2.org_id as client_cert_org,

k3.description as client_key_name, k3.key as client_key, k3.org_id as client_key_org

from rhncontentsource cs inner join

rhncontentsourcessl csssl on cs.id = csssl.content_source_id inner join

rhncryptokey k1 on csssl.ssl_ca_cert_id = k1.id left outer join

rhncryptokey k2 on csssl.ssl_client_cert_id = k2.id left outer join

rhncryptokey k3 on csssl.ssl_client_key_id = k3.id

where cs.id = 507

2018-07-12 13:11:10.462 CEST ERROR: current transaction is aborted, commands ignored until end of transaction block

2018-07-12 13:11:10.462 CEST STATEMENT:

select k1.description as ca_cert_name, k1.key as ca_cert, k1.org_id as ca_cert_org,

k2.description as client_cert_name, k2.key as client_cert, k2.org_id as client_cert_org,

k3.description as client_key_name, k3.key as client_key, k3.org_id as client_key_org

from rhncontentsource cs inner join

rhncontentsourcessl csssl on cs.id = csssl.content_source_id inner join

rhncryptokey k1 on csssl.ssl_ca_cert_id = k1.id left outer join

rhncryptokey k2 on csssl.ssl_client_cert_id = k2.id left outer join

rhncryptokey k3 on csssl.ssl_client_key_id = k3.id

where cs.id = 507

Marked in red we have the package_ids that have to be in any way responsible for the failed sync. Let’s do a quick search via the GUI to find out more about them. After looking them up one by one I discovered that all three of them where still in the package queue and not associated with a channel yet. You can check that in the software package manager.

Orphaned packages

So whats next? I know that these packages remain in the queue because the import fails but the real cause of the error seems to be something that’s already in the database which has some attribute with the exact same value as one of these three packages. Let’s google that error message and see if someone had the same issue in the past.
Seems like this happend once in 2016 because of a duplicate package published by Red Hat. Let’s see if one of these packages already exists in the database… Bingo! The sysstat and python-setuptools package have already been published some month prior but with a faulty naming schema. Like in 2016 the package names didn’t contain a dot between the package version and the OS release version. In theory simply deleting the faulty package should solve the problem but let’s see…

Delete the duplicate package

Okay and now let’s try to rerun the channel synchronisation and see if it runs without errors.
13:51:02 Sync of channel completed in 0:04:30.
13:51:02 Total time: 0:04:30

Perfect 🙂

Tagged , , , , , , , , , , , , . Bookmark the permalink.

4 Responses to Spacewalk: Failed channel syncs because of duplicate packages

  1. Thank You, Robert!
    This article helped me after almost 3 months digging about this error 😀
    Keep nerding!

  2. François says:

    Yeah, thanks for this !!

  3. tyal says:

    It also helped me. Thank you very much.

  4. Chris Hufnagel says:

    Thank You! Working with over 32k packages my eyes were bleeding trying to see what packages were incorrect. Your screen shots helped me see the issues on my inhouse yum repos.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.