drbd verbindet sich nicht zuverlässig nach reboot

Alle weiteren Dienste, die nicht in die drei oberen Foren gehören.
Antworten
Exxter
Beiträge: 383
Registriert: 10.01.2003 00:15:15
Lizenz eigener Beiträge: GNU General Public License

drbd verbindet sich nicht zuverlässig nach reboot

Beitrag von Exxter » 22.05.2017 10:43:13

Hallo,

ich habe zwei gleiche root-Server mit Debian Stretch und auf beiden drbd mit 4 Resourcen installiert. Alles läuft soweit, nur wenn ich den primären Server neu starte, verbinden sich die 4 Resourcen nicht alle korrekt. Komischerweise ist es immer mal eine andere oder zwei, die auf eine Verbindung warten (WFConnection). Auch ein längeres warten nützt nichts. In der syslog steht:

Code: Alles auswählen

May 22 10:32:07 SERVER kernel: [  939.779324] drbd home: initial packet S crossed
May 22 10:32:28 SERVER kernel: [  960.771970] drbd home: initial packet S crossed
May 22 10:32:39 SERVER kernel: [  971.524306] drbd home: initial packet S crossed
May 22 10:33:00 SERVER kernel: [  992.516894] drbd home: initial packet S crossed
May 22 10:33:21 SERVER kernel: [ 1013.509478] drbd home: initial packet S crossed
May 22 10:33:42 SERVER kernel: [ 1034.502062] drbd home: initial packet S crossed
May 22 10:34:03 SERVER kernel: [ 1055.494650] drbd home: initial packet S crossed
Ich habe schon gegoogelt aber keine Infos zu dem Problem gefunden. Habe auch schon ein Script geschrieben, welches bis zu 60 Sekunden wartet zwischen den Starts der Resourcen. Es hat bisher nichts genützt. Firewall hatte ich auch schon aus. Da sich Resourcen aber verbinden, kann es auch nicht an der VPN-Verbindung zwischen den zwei Server liegen.

Jemand eine Idee woran es hier noch liegen könnte? Mein jetziges Startscript der drbd Resourcen ist:

Code: Alles auswählen

#!/bin/bash

sleep 60s

drbdadm up home
drbdadm up daten1
drbdadm up opt
drbdadm up www

#
# drbd1 - /home
#

#drbdadm up home
drbdadm primary home
mount /dev/drbd1 /home

echo "drbd1 /home gestartet ..."

#
# drbd2 - /daten1
#

sleep 50s
#drbdadm up daten1
drbdadm primary daten1
mount /dev/drbd2 /daten1/
mount --bind /daten1/var/lib/libvirt /var/lib/libvirt/
mount --bind /daten1/var/lib/ldap /var/lib/ldap/

echo "drbd2 /daten1 (kvm und ldap) gestartet ..."

#
# drbd3 - /opt
#

sleep 60s
#drbdadm up opt
drbdadm primary opt
mount /dev/drbd3 /opt

echo "drbd3 /opt gestartet ..."

#
# drbd4 - /var/www/
#

sleep 50s
#drbdadm up www
drbdadm primary www
mount /dev/drbd4 /var/www/

echo "drbd4 /var/www gestartet ..."

#
# Start der Programme
#

exit 0

Zuletzt geändert von Exxter am 22.05.2017 10:53:24, insgesamt 1-mal geändert.

Benutzeravatar
heisenberg
Beiträge: 3548
Registriert: 04.06.2015 01:17:27
Lizenz eigener Beiträge: MIT Lizenz

Re: drbd verbindet sich nicht zuverlässig nach reboot

Beitrag von heisenberg » 22.05.2017 10:48:15

  • Funktioniert es, wenn Du die DRBD-Resourcen per Hand in der Shell startest?
  • Welche DRBD-Version?
  • Was sagt drbdsetup status --verbose --statistics?
Falls die Resourcen nicht korrekt gestartet werden, dann meisst aus gutem Grund. Z. B. ist der Datenbestand evtl. nicht sauber, oder die DRBD-Resourcen wurden nicht sauber heruntergefahren.
Jede Rohheit hat ihren Ursprung in einer Schwäche.

Exxter
Beiträge: 383
Registriert: 10.01.2003 00:15:15
Lizenz eigener Beiträge: GNU General Public License

Re: drbd verbindet sich nicht zuverlässig nach reboot

Beitrag von Exxter » 22.05.2017 11:13:55

Funktioniert es, wenn Du die DRBD-Resourcen per Hand in der Shell startest?
Ich habe gerade zuerst:

Code: Alles auswählen

drbdadm up home
drbdadm up daten1
drbdadm up opt
drbdadm up www
danach:

Code: Alles auswählen

drbdadm primary home
drbdadm primary daten1
drbdadm primary opt
drbdadm primary www
ins Terminal kopiert. Ist das gleiche Problem, eine Verbindung blieb bei WFConnection.
Welche DRBD-Version?
version: 8.4.7
Was sagt drbdsetup status --verbose --statistics?

Code: Alles auswählen

root@SERVER ~ # drbdsetup status --verbose --statistics
daten1 role:Primary suspended:no
    write-ordering:flush
  volume:0 minor:2 disk:UpToDate
      size:209577696 read:172032 written:0 al-writes:0 bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no
  peer connection:Connected role:Secondary congested:no
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
        received:0 sent:172032 out-of-sync:0 pending:0 unacked:0

home role:Primary suspended:no
    write-ordering:flush
  volume:0 minor:1 disk:UpToDate
      size:104788828 read:4096 written:0 al-writes:0 bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no
  peer connection:Connected role:Secondary congested:no
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
        received:0 sent:4096 out-of-sync:0 pending:0 unacked:0

opt role:Primary suspended:no
    write-ordering:flush
  volume:0 minor:3 disk:UpToDate
      size:1048412896 read:0 written:0 al-writes:0 bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no
  peer connection:Connecting role:Unknown congested:no
    volume:0 replication:Off peer-disk:DUnknown resync-suspended:no
        received:0 sent:0 out-of-sync:4096 pending:0 unacked:0

www role:Primary suspended:no
    write-ordering:flush
  volume:0 minor:4 disk:UpToDate
      size:1235072604 read:4096 written:0 al-writes:0 bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no
  peer connection:Connected role:Secondary congested:no
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
        received:0 sent:4096 out-of-sync:0 pending:0 unacked:0

root@SERVER ~ #
Die Ports in den resource-Dateien unter /etc/drbd.d/*.res sind auch alle unterschiedlich.

Verbunden sind die Server via OpenVPN.

Dass sie nicht starten wenn der Server zB. abgeschmiert ist ok, aber ich mache immer einen normalen reboot. Außerdem sind es immer andere Resourcen, auch welche auf denen fast nichts passiert.

Edit: da bringst du mich auf eine Idee. Könnte es sein, dass beim Reboot zB. /home noch mounted ist, wärend drbd bereits beendet wird oder sowas und deswegen das Dateisystem nicht ganz OK ist? Dass da die Reihenfolge nicht eingehalten wird? Und irgendwann ein Timeout kommt der die Prozesse killt. Das teste ich.

Benutzeravatar
heisenberg
Beiträge: 3548
Registriert: 04.06.2015 01:17:27
Lizenz eigener Beiträge: MIT Lizenz

Re: drbd verbindet sich nicht zuverlässig nach reboot

Beitrag von heisenberg » 22.05.2017 11:40:36

Grundsätzlich wird DRBD im WAN-Bereich(Nehme ich aufgrund des Stichwortes OpenVPN mal an, muss aber natürlich nicht sein) nicht empfohlen(höchstens mit DRBD-Proxy, der kostenpflichtig ist). Was hast Du denn für eine Bandbreite in beide Richtungen? Ich würde es im WAN-Bereich nicht einsetzen. Im übrigen wird im WAN-Bereich mit synchroner DRBD-Replizierung(protocol C) die Performance unterirdisch sein.

Die Ausgabe zeigt ja, dass in dem Fall home noch nicht verbunden hat. Diese Resource befindet sich noch im Verbindungsaufbau. Du könntest Mal beobachten, ob die Verbindung irgendwann zustande kommt, ob es also nur eine gewisse Zeit dauert.

Ich würde auch empfehlen die DRBD-Logs von beiden Servern beim absetzen der einzelnen Befehle zu beobachten.
Jede Rohheit hat ihren Ursprung in einer Schwäche.

Exxter
Beiträge: 383
Registriert: 10.01.2003 00:15:15
Lizenz eigener Beiträge: GNU General Public License

Re: drbd verbindet sich nicht zuverlässig nach reboot

Beitrag von Exxter » 22.05.2017 11:52:34

heisenberg hat geschrieben:Grundsätzlich wird DRBD im WAN-Bereich(Nehme ich aufgrund des Stichwortes OpenVPN mal an, muss aber natürlich nicht sein) nicht empfohlen. Was hast Du denn für eine Bandbreite in beide Richtungen? Ich würde es im WAN-Bereich nicht einsetzen.

Die Ausgabe zeigt ja, dass in dem Fall home noch nicht verbunden hat. Diese Resource befindet sich noch im Verbindungsaufbau. Du könntest Mal beobachten, ob die Verbindung irgendwann zustande kommt, ob es also nur eine gewisse Zeit dauert.
Die Bandbreite sollte reichen, die Server sind per Gigabit-LAN verbunden und stehen in benachbarten (gleicher Ort) Rechenzentren. Als ich eine fette Datei von einem zum anderen übertragen habe, hatte ich über 90mb/s mit scp.

Habe jetzt mit:

Code: Alles auswählen

drbdadm disconnect daten1
drbdadm connect --discard-my-data daten1
drbdadm connect daten1
auf dem Slave erstmal wieder die Resourcen verbunden. Jetzt teste ich ob es mit dem Reboot zusammen hängt.

Jetzt connected home nicht mehr. Auf dem Slave kommt:

Code: Alles auswählen

May 22 12:05:36 slave kernel: [ 8358.648482] drbd home: conn( BrokenPipe -> Unconnected )
May 22 12:05:37 slave kernel: [ 8359.652466] drbd home: conn( Unconnected -> WFConnection )
May 22 12:05:51 slave kernel: [ 8373.669708] drbd home: sock_recvmsg returned -11
May 22 12:05:51 slave kernel: [ 8373.669771] drbd home: conn( WFConnection -> BrokenPipe )
May 22 12:05:51 slave kernel: [ 8373.669805] drbd home: short read (expected size 8)
May 22 12:05:51 slave kernel: [ 8373.705779] drbd home: Connection closed
May 22 12:05:51 slave kernel: [ 8373.705841] drbd home: conn( BrokenPipe -> Unconnected )
May 22 12:05:52 slave kernel: [ 8374.725812] drbd home: conn( Unconnected -> WFConnection )
May 22 12:06:06 slave kernel: [ 8388.775039] drbd home: sock_recvmsg returned -11
May 22 12:06:06 slave kernel: [ 8388.775101] drbd home: conn( WFConnection -> BrokenPipe )
May 22 12:06:06 slave kernel: [ 8388.775134] drbd home: short read (expected size 8)
May 22 12:06:06 slave kernel: [ 8388.807115] drbd home: Connection closed
Auf dem Master wieder:

Code: Alles auswählen

May 22 12:05:51 master kernel: [  143.841981] drbd home: initial packet S crossed
May 22 12:06:12 master kernel: [  164.833374] drbd home: initial packet S crossed
May 22 12:06:33 master kernel: [  185.824822] drbd home: initial packet S crossed
May 22 12:06:54 master kernel: [  206.816228] drbd home: initial packet S crossed
May 22 12:07:05 master kernel: [  217.567917] drbd home: initial packet S crossed
May 22 12:07:26 master kernel: [  238.559320] drbd home: initial packet S crossed
May 22 12:07:37 master kernel: [  249.311014] drbd home: initial packet S crossed
May 22 12:07:47 master kernel: [  260.062715] drbd home: initial packet S crossed
Und es hilft nicht mal ein --discard-my-data wie vorhin bei daten1.
Zuletzt geändert von Exxter am 22.05.2017 12:15:58, insgesamt 3-mal geändert.

Benutzeravatar
heisenberg
Beiträge: 3548
Registriert: 04.06.2015 01:17:27
Lizenz eigener Beiträge: MIT Lizenz

Re: drbd verbindet sich nicht zuverlässig nach reboot

Beitrag von heisenberg » 22.05.2017 12:08:35

Die Bandbreite sollte reichen, die Server sind per Gigabit-LAN verbunden und stehen in benachbarten (gleicher Ort) Rechenzentren.
Ja. IMHO passt das absolut.
Jede Rohheit hat ihren Ursprung in einer Schwäche.

Benutzeravatar
heisenberg
Beiträge: 3548
Registriert: 04.06.2015 01:17:27
Lizenz eigener Beiträge: MIT Lizenz

Re: drbd verbindet sich nicht zuverlässig nach reboot

Beitrag von heisenberg » 22.05.2017 13:39:44

Das obige sieht doch schon mal nach Fehler aus. Interessant wäre mal das komplette mit drbd-gegreppte Logfile. Das was man jetzt sieht ist der aktuelle Stand. Was interessant ist, ist aber der Punkt, an dem der Fehler beginnt.

Ich habe noch keine mir bekannte typische DRBD-Fehlersituation in Auszügen wiedererkannt.
Jede Rohheit hat ihren Ursprung in einer Schwäche.

Exxter
Beiträge: 383
Registriert: 10.01.2003 00:15:15
Lizenz eigener Beiträge: GNU General Public License

Re: drbd verbindet sich nicht zuverlässig nach reboot

Beitrag von Exxter » 22.05.2017 13:57:54

Um ca 13:41Uhr habe ich einen Reboot beider Server gemacht. Home connected, aber jetzt connected daten1 nicht mehr, hier der syslog nach drbd gegrept vom master:

Code: Alles auswählen

May 22 13:37:32 master kernel: [ 5644.359584] drbd home: initial packet S crossed
May 22 13:37:42 master kernel: [ 5654.599297] drbd home: initial packet S crossed
May 22 13:37:52 master kernel: [ 5664.839011] drbd home: initial packet S crossed
May 22 13:38:03 master kernel: [ 5675.590708] drbd home: initial packet S crossed
May 22 13:38:35 master kernel: [ 5707.845817] drbd home: initial packet S crossed
May 22 13:38:46 master kernel: [ 5718.085524] drbd home: initial packet S crossed
May 22 13:38:56 master kernel: [ 5728.837248] drbd home: initial packet S crossed
May 22 13:39:07 master kernel: [ 5739.588926] drbd home: initial packet S crossed
May 22 13:39:18 master kernel: [ 5750.340628] drbd home: initial packet S crossed
May 22 13:39:28 master kernel: [ 5760.580345] drbd home: initial packet S crossed
May 22 13:40:00 master kernel: [ 5792.835445] drbd home: initial packet S crossed
May 22 13:40:11 master kernel: [ 5803.587139] drbd home: initial packet S crossed
May 22 13:40:22 master kernel: [ 5814.338840] drbd home: initial packet S crossed
May 22 13:41:05 master systemd-modules-load[356]: Inserted module 'drbd'
May 22 13:41:05 master kernel: [    5.197217] drbd: initialized. Version: 8.4.7 (api:1/proto:86-101)
May 22 13:41:05 master kernel: [    5.197276] drbd: srcversion: 0904DF2CCF7283ACE07D07A 
May 22 13:41:05 master kernel: [    5.197329] drbd: registered as block device major 147
May 22 13:43:29 master kernel: [  152.928361] drbd home: Starting worker thread (from drbdsetup-84 [2099])
May 22 13:43:29 master kernel: [  152.928982] block drbd1: disk( Diskless -> Attaching ) 
May 22 13:43:29 master kernel: [  152.929185] drbd home: Method to ensure write ordering: flush
May 22 13:43:29 master kernel: [  152.929232] block drbd1: max BIO size = 1048576
May 22 13:43:29 master kernel: [  152.929278] block drbd1: drbd_bm_resize called with capacity == 209577656
May 22 13:43:29 master kernel: [  152.929758] block drbd1: resync bitmap: bits=26197207 words=409332 pages=800
May 22 13:43:29 master kernel: [  152.929807] block drbd1: size = 100 GB (104788828 KB)
May 22 13:43:29 master kernel: [  152.966810] block drbd1: recounting of set bits took additional 0 jiffies
May 22 13:43:29 master kernel: [  152.966861] block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
May 22 13:43:29 master kernel: [  152.966927] block drbd1: disk( Attaching -> UpToDate ) 
May 22 13:43:29 master kernel: [  152.966985] block drbd1: attached to UUIDs C5C8AC65FCE136F8:E584C051FA7A063C:B7A0ACE712B516D6:B79FACE712B516D6
May 22 13:43:29 master kernel: [  152.978183] drbd home: conn( StandAlone -> Unconnected ) 
May 22 13:43:29 master kernel: [  152.978257] drbd home: Starting receiver thread (from drbd_w_home [2100])
May 22 13:43:29 master kernel: [  152.978408] drbd home: receiver (re)started
May 22 13:43:29 master kernel: [  152.978509] drbd home: conn( Unconnected -> WFConnection ) 
May 22 13:43:34 master kernel: [  158.219913] block drbd1: role( Secondary -> Primary ) 
May 22 13:43:50 master kernel: [  173.832997] drbd home: Handshake successful: Agreed network protocol version 101
May 22 13:43:50 master kernel: [  173.833064] drbd home: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
May 22 13:43:50 master kernel: [  173.833214] drbd home: conn( WFConnection -> WFReportParams ) 
May 22 13:43:50 master kernel: [  173.833276] drbd home: Starting ack_recv thread (from drbd_r_home [2105])
May 22 13:43:50 master kernel: [  173.864981] block drbd1: drbd_sync_handshake:
May 22 13:43:50 master kernel: [  173.865042] block drbd1: self C5C8AC65FCE136F9:E584C051FA7A063C:B7A0ACE712B516D6:B79FACE712B516D6 bits:0 flags:0
May 22 13:43:50 master kernel: [  173.865124] block drbd1: peer E584C051FA7A063C:0000000000000000:B7A0ACE712B516D6:B79FACE712B516D6 bits:0 flags:0
May 22 13:43:50 master kernel: [  173.865202] block drbd1: uuid_compare()=1 by rule 70
May 22 13:43:50 master kernel: [  173.865263] block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) 
May 22 13:43:50 master kernel: [  173.866023] block drbd1: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
May 22 13:43:50 master kernel: [  173.901028] block drbd1: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
May 22 13:43:50 master kernel: [  173.901111] block drbd1: helper command: /sbin/drbdadm before-resync-source minor-1
May 22 13:43:50 master kernel: [  173.903284] block drbd1: helper command: /sbin/drbdadm before-resync-source minor-1 exit code 0 (0x0)
May 22 13:43:50 master kernel: [  173.903391] block drbd1: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) 
May 22 13:43:50 master kernel: [  173.903490] block drbd1: Began resync as SyncSource (will sync 0 KB [0 bits set]).
May 22 13:43:50 master kernel: [  173.903603] block drbd1: updated sync UUID C5C8AC65FCE136F9:E585C051FA7A063C:E584C051FA7A063C:B7A0ACE712B516D6
May 22 13:43:50 master kernel: [  173.945170] block drbd1: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
May 22 13:43:50 master kernel: [  173.945236] block drbd1: updated UUIDs C5C8AC65FCE136F9:0000000000000000:E585C051FA7A063C:E584C051FA7A063C
May 22 13:43:50 master kernel: [  173.945315] block drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 
May 22 13:45:08 master kernel: [  252.066308] drbd daten1: Starting worker thread (from drbdsetup-84 [2546])
May 22 13:45:08 master kernel: [  252.066823] block drbd2: disk( Diskless -> Attaching ) 
May 22 13:45:08 master kernel: [  252.067019] drbd daten1: Method to ensure write ordering: flush
May 22 13:45:08 master kernel: [  252.067078] block drbd2: max BIO size = 1048576
May 22 13:45:08 master kernel: [  252.067135] block drbd2: drbd_bm_resize called with capacity == 419155392
May 22 13:45:08 master kernel: [  252.068071] block drbd2: resync bitmap: bits=52394424 words=818663 pages=1599
May 22 13:45:08 master kernel: [  252.068133] block drbd2: size = 200 GB (209577696 KB)
May 22 13:45:08 master kernel: [  252.124311] block drbd2: recounting of set bits took additional 0 jiffies
May 22 13:45:08 master kernel: [  252.124376] block drbd2: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
May 22 13:45:08 master kernel: [  252.124441] block drbd2: disk( Attaching -> UpToDate ) 
May 22 13:45:08 master kernel: [  252.124512] block drbd2: attached to UUIDs 6BB6FCACF35A1FA9:10A69C08215C0CFF:F881C8F6F930412F:F880C8F6F930412F
May 22 13:45:08 master kernel: [  252.157563] drbd daten1: conn( StandAlone -> Unconnected ) 
May 22 13:45:08 master kernel: [  252.157663] drbd daten1: Starting receiver thread (from drbd_w_daten1 [2547])
May 22 13:45:08 master kernel: [  252.157934] drbd daten1: receiver (re)started
May 22 13:45:08 master kernel: [  252.158026] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:45:16 master kernel: [  260.690504] block drbd2: role( Secondary -> Primary ) 
May 22 13:45:39 master kernel: [  283.389289] drbd daten1: initial packet S crossed
May 22 13:46:00 master kernel: [  304.379088] drbd daten1: initial packet S crossed
May 22 13:46:11 master kernel: [  315.130225] drbd daten1: initial packet S crossed
May 22 13:46:22 master kernel: [  325.881533] drbd daten1: initial packet S crossed
May 22 13:46:32 master kernel: [  336.633021] drbd daten1: initial packet S crossed
May 22 13:46:43 master kernel: [  347.384510] drbd daten1: initial packet S crossed
May 22 13:47:04 master kernel: [  368.375526] drbd daten1: initial packet S crossed
May 22 13:47:15 master kernel: [  379.127021] drbd daten1: initial packet S crossed
May 22 13:47:26 master kernel: [  389.878520] drbd daten1: initial packet S crossed
May 22 13:47:47 master kernel: [  410.869524] drbd daten1: initial packet S crossed
May 22 13:47:57 master kernel: [  421.621016] drbd daten1: initial packet S crossed
May 22 13:48:08 master kernel: [  432.372520] drbd daten1: initial packet S crossed
May 22 13:48:29 master kernel: [  453.363834] drbd daten1: initial packet S crossed
May 22 13:48:40 master kernel: [  464.115471] drbd daten1: initial packet S crossed
May 22 13:48:41 master kernel: [  464.691470] drbd daten1: Handshake successful: Agreed network protocol version 101
May 22 13:48:41 master kernel: [  464.691548] drbd daten1: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
May 22 13:48:41 master kernel: [  464.691699] drbd daten1: conn( WFConnection -> WFReportParams ) 
May 22 13:48:41 master kernel: [  464.691762] drbd daten1: Starting ack_recv thread (from drbd_r_daten1 [2553])
May 22 13:48:41 master kernel: [  464.723454] block drbd2: drbd_sync_handshake:
May 22 13:48:41 master kernel: [  464.723525] block drbd2: self 6BB6FCACF35A1FA9:10A69C08215C0CFF:F881C8F6F930412F:F880C8F6F930412F bits:0 flags:0
May 22 13:48:41 master kernel: [  464.723603] block drbd2: peer 10A69C08215C0CFE:0000000000000000:F881C8F6F930412E:F880C8F6F930412F bits:0 flags:0
May 22 13:48:41 master kernel: [  464.723681] block drbd2: uuid_compare()=1 by rule 70
May 22 13:48:41 master kernel: [  464.723742] block drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) 
May 22 13:48:41 master kernel: [  464.725096] block drbd2: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
May 22 13:48:41 master kernel: [  464.793030] block drbd2: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
May 22 13:48:41 master kernel: [  464.793113] block drbd2: helper command: /sbin/drbdadm before-resync-source minor-2
May 22 13:48:41 master kernel: [  464.795337] block drbd2: helper command: /sbin/drbdadm before-resync-source minor-2 exit code 0 (0x0)
May 22 13:48:41 master kernel: [  464.795458] block drbd2: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) 
May 22 13:48:41 master kernel: [  464.795553] block drbd2: Began resync as SyncSource (will sync 0 KB [0 bits set]).
May 22 13:48:41 master kernel: [  464.795655] block drbd2: updated sync UUID 6BB6FCACF35A1FA9:10A79C08215C0CFF:10A69C08215C0CFF:F881C8F6F930412F
May 22 13:48:41 master kernel: [  464.843274] block drbd2: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
May 22 13:48:41 master kernel: [  464.843340] block drbd2: updated UUIDs 6BB6FCACF35A1FA9:0000000000000000:10A79C08215C0CFF:10A69C08215C0CFF
May 22 13:48:41 master kernel: [  464.843419] block drbd2: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 
May 22 13:59:03 master kernel: [ 1087.425828] drbd opt: Starting worker thread (from drbdsetup-84 [5534])
May 22 13:59:03 master kernel: [ 1087.426280] block drbd3: disk( Diskless -> Attaching ) 
May 22 13:59:03 master kernel: [ 1087.426463] drbd opt: Method to ensure write ordering: flush
May 22 13:59:03 master kernel: [ 1087.426488] block drbd3: max BIO size = 1048576
May 22 13:59:03 master kernel: [ 1087.426512] block drbd3: drbd_bm_resize called with capacity == 2096825792
May 22 13:59:03 master kernel: [ 1087.431242] block drbd3: resync bitmap: bits=262103224 words=4095363 pages=7999
May 22 13:59:03 master kernel: [ 1087.431297] block drbd3: size = 1000 GB (1048412896 KB)
May 22 13:59:04 master kernel: [ 1087.674029] block drbd3: recounting of set bits took additional 2 jiffies
May 22 13:59:04 master kernel: [ 1087.674060] block drbd3: 4096 KB (1024 bits) marked out-of-sync by on disk bit-map.
May 22 13:59:04 master kernel: [ 1087.674105] block drbd3: disk( Attaching -> UpToDate ) 
May 22 13:59:04 master kernel: [ 1087.674142] block drbd3: attached to UUIDs C197C48A906FF440:A651593CCD79601D:1B78D0B5EC494B7B:1B77D0B5EC494B7B
May 22 13:59:04 master kernel: [ 1087.933640] drbd opt: conn( StandAlone -> Unconnected ) 
May 22 13:59:04 master kernel: [ 1087.933701] drbd opt: Starting receiver thread (from drbd_w_opt [5535])
May 22 13:59:04 master kernel: [ 1087.933889] drbd opt: receiver (re)started
May 22 13:59:04 master kernel: [ 1087.933945] drbd opt: conn( Unconnected -> WFConnection ) 
May 22 13:59:08 master kernel: [ 1092.614668] block drbd3: role( Secondary -> Primary ) 
May 22 13:59:35 master kernel: [ 1118.934302] drbd opt: initial packet S crossed
May 22 13:59:56 master kernel: [ 1139.925195] drbd opt: initial packet S crossed
May 22 14:00:17 master kernel: [ 1160.916096] drbd opt: initial packet S crossed
May 22 14:00:38 master kernel: [ 1181.907028] drbd opt: initial packet S crossed
May 22 14:00:59 master kernel: [ 1202.897969] drbd opt: initial packet S crossed
May 22 14:01:10 master kernel: [ 1213.649383] drbd opt: initial packet S crossed
May 22 14:01:20 master kernel: [ 1224.400806] drbd opt: initial packet S crossed
Edit: ich sehe jetzt erst, jetzt ist auch daten1 verbunden.
Zuletzt geändert von Exxter am 22.05.2017 14:24:22, insgesamt 2-mal geändert.

Exxter
Beiträge: 383
Registriert: 10.01.2003 00:15:15
Lizenz eigener Beiträge: GNU General Public License

Re: drbd verbindet sich nicht zuverlässig nach reboot

Beitrag von Exxter » 22.05.2017 14:04:08

Hier noch die syslog vom Slave. Die erste und zweite Resource home und daten1 haben sich verbunden, opt verbindet sich nicht:

Code: Alles auswählen

May 22 13:40:15 slave kernel: [14037.919284] drbd home: conn( WFConnection -> BrokenPipe ) 
May 22 13:40:15 slave kernel: [14037.919312] drbd home: short read (expected size 8)
May 22 13:40:15 slave kernel: [14037.979371] drbd home: Connection closed
May 22 13:40:15 slave kernel: [14037.979430] drbd home: conn( BrokenPipe -> Unconnected ) 
May 22 13:40:16 slave kernel: [14039.007327] drbd home: conn( Unconnected -> WFConnection ) 
May 22 13:40:27 slave kernel: [14050.208291] drbd home: sock_recvmsg returned -11
May 22 13:40:27 slave kernel: [14050.208351] drbd home: conn( WFConnection -> BrokenPipe ) 
May 22 13:40:27 slave kernel: [14050.208380] drbd home: short read (expected size 8)
May 22 13:40:27 slave kernel: [14050.248453] drbd home: Connection closed
May 22 13:40:27 slave kernel: [14050.248513] drbd home: conn( BrokenPipe -> Unconnected ) 
May 22 13:40:28 slave kernel: [14051.264438] drbd home: conn( Unconnected -> WFConnection ) 
May 22 13:41:23 slave systemd-modules-load[358]: Inserted module 'drbd'
May 22 13:41:23 slave kernel: [    5.073271] drbd: initialized. Version: 8.4.7 (api:1/proto:86-101)
May 22 13:41:23 slave kernel: [    5.073329] drbd: srcversion: 0904DF2CCF7283ACE07D07A 
May 22 13:41:23 slave kernel: [    5.073380] drbd: registered as block device major 147
May 22 13:41:25 slave CRON[666]: (root) CMD (   /usr/local/sbin/drbd-slave-boot)
May 22 13:42:25 slave kernel: [   69.863134] drbd home: Starting worker thread (from drbdsetup-84 [1587])
May 22 13:42:25 slave kernel: [   69.863446] block drbd1: disk( Diskless -> Attaching ) 
May 22 13:42:25 slave kernel: [   69.863629] drbd home: Method to ensure write ordering: flush
May 22 13:42:25 slave kernel: [   69.863676] block drbd1: max BIO size = 1048576
May 22 13:42:25 slave kernel: [   69.863721] block drbd1: drbd_bm_resize called with capacity == 209577656
May 22 13:42:25 slave kernel: [   69.864212] block drbd1: resync bitmap: bits=26197207 words=409332 pages=800
May 22 13:42:25 slave kernel: [   69.864262] block drbd1: size = 100 GB (104788828 KB)
May 22 13:42:25 slave kernel: [   69.901971] block drbd1: recounting of set bits took additional 0 jiffies
May 22 13:42:25 slave kernel: [   69.902022] block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
May 22 13:42:25 slave kernel: [   69.902074] block drbd1: disk( Attaching -> Outdated ) 
May 22 13:42:25 slave kernel: [   69.902131] block drbd1: attached to UUIDs E584C051FA7A063C:0000000000000000:B7A0ACE712B516D6:B79FACE712B516D6
May 22 13:42:25 slave kernel: [   69.927564] drbd home: conn( StandAlone -> Unconnected ) 
May 22 13:42:25 slave kernel: [   69.927641] drbd home: Starting receiver thread (from drbd_w_home [1588])
May 22 13:42:25 slave kernel: [   69.927769] drbd home: receiver (re)started
May 22 13:42:25 slave kernel: [   69.927849] drbd home: conn( Unconnected -> WFConnection ) 
May 22 13:42:55 slave kernel: [  100.113489] drbd daten1: Starting worker thread (from drbdsetup-84 [1749])
May 22 13:42:55 slave kernel: [  100.115093] block drbd2: disk( Diskless -> Attaching ) 
May 22 13:42:55 slave kernel: [  100.115314] drbd daten1: Method to ensure write ordering: flush
May 22 13:42:55 slave kernel: [  100.115363] block drbd2: max BIO size = 1048576
May 22 13:42:55 slave kernel: [  100.115410] block drbd2: drbd_bm_resize called with capacity == 419155392
May 22 13:42:55 slave kernel: [  100.116397] block drbd2: resync bitmap: bits=52394424 words=818663 pages=1599
May 22 13:42:55 slave kernel: [  100.116460] block drbd2: size = 200 GB (209577696 KB)
May 22 13:42:55 slave kernel: [  100.162117] block drbd2: recounting of set bits took additional 0 jiffies
May 22 13:42:55 slave kernel: [  100.162182] block drbd2: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
May 22 13:42:55 slave kernel: [  100.162249] block drbd2: disk( Attaching -> UpToDate ) 
May 22 13:42:55 slave kernel: [  100.162322] block drbd2: attached to UUIDs 10A69C08215C0CFE:0000000000000000:F881C8F6F930412E:F880C8F6F930412F
May 22 13:42:55 slave kernel: [  100.295271] drbd daten1: conn( StandAlone -> Unconnected ) 
May 22 13:42:55 slave kernel: [  100.295359] drbd daten1: Starting receiver thread (from drbd_w_daten1 [1750])
May 22 13:42:55 slave kernel: [  100.295501] drbd daten1: receiver (re)started
May 22 13:42:55 slave kernel: [  100.295596] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:43:26 slave kernel: [  130.504805] drbd opt: Starting worker thread (from drbdsetup-84 [1859])
May 22 13:43:26 slave kernel: [  130.505277] block drbd3: disk( Diskless -> Attaching ) 
May 22 13:43:26 slave kernel: [  130.505476] drbd opt: Method to ensure write ordering: flush
May 22 13:43:26 slave kernel: [  130.505535] block drbd3: max BIO size = 1048576
May 22 13:43:26 slave kernel: [  130.505593] block drbd3: drbd_bm_resize called with capacity == 2096825792
May 22 13:43:26 slave kernel: [  130.510095] block drbd3: resync bitmap: bits=262103224 words=4095363 pages=7999
May 22 13:43:26 slave kernel: [  130.510172] block drbd3: size = 1000 GB (1048412896 KB)
May 22 13:43:26 slave kernel: [  130.767437] block drbd3: recounting of set bits took additional 2 jiffies
May 22 13:43:26 slave kernel: [  130.767501] block drbd3: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
May 22 13:43:26 slave kernel: [  130.767565] block drbd3: disk( Attaching -> UpToDate ) 
May 22 13:43:26 slave kernel: [  130.767635] block drbd3: attached to UUIDs A651593CCD79601C:0000000000000000:1B78D0B5EC494B7A:1B77D0B5EC494B7B
May 22 13:43:26 slave kernel: [  131.003954] drbd opt: conn( StandAlone -> Unconnected ) 
May 22 13:43:26 slave kernel: [  131.004038] drbd opt: Starting receiver thread (from drbd_w_opt [1860])
May 22 13:43:26 slave kernel: [  131.004254] drbd opt: receiver (re)started
May 22 13:43:26 slave kernel: [  131.004341] drbd opt: conn( Unconnected -> WFConnection ) 
May 22 13:43:50 slave kernel: [  154.536212] drbd home: Handshake successful: Agreed network protocol version 101
May 22 13:43:50 slave kernel: [  154.536292] drbd home: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
May 22 13:43:50 slave kernel: [  154.536447] drbd home: conn( WFConnection -> WFReportParams ) 
May 22 13:43:50 slave kernel: [  154.536510] drbd home: Starting ack_recv thread (from drbd_r_home [1594])
May 22 13:43:50 slave kernel: [  154.565357] block drbd1: drbd_sync_handshake:
May 22 13:43:50 slave kernel: [  154.565418] block drbd1: self E584C051FA7A063C:0000000000000000:B7A0ACE712B516D6:B79FACE712B516D6 bits:0 flags:0
May 22 13:43:50 slave kernel: [  154.565497] block drbd1: peer C5C8AC65FCE136F9:E584C051FA7A063C:B7A0ACE712B516D6:B79FACE712B516D6 bits:0 flags:2
May 22 13:43:50 slave kernel: [  154.565575] block drbd1: uuid_compare()=-1 by rule 50
May 22 13:43:50 slave kernel: [  154.565637] block drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) 
May 22 13:43:50 slave kernel: [  154.601525] block drbd1: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
May 22 13:43:50 slave kernel: [  154.602268] block drbd1: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
May 22 13:43:50 slave kernel: [  154.602351] block drbd1: conn( WFBitMapT -> WFSyncUUID ) 
May 22 13:43:50 slave kernel: [  154.619923] block drbd1: updated sync uuid E585C051FA7A063C:0000000000000000:B7A0ACE712B516D6:B79FACE712B516D6
May 22 13:43:50 slave kernel: [  154.626502] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1
May 22 13:43:50 slave kernel: [  154.628584] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0)
May 22 13:43:50 slave kernel: [  154.628679] block drbd1: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) 
May 22 13:43:50 slave kernel: [  154.628775] block drbd1: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
May 22 13:43:50 slave kernel: [  154.630646] block drbd1: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
May 22 13:43:50 slave kernel: [  154.630710] block drbd1: updated UUIDs C5C8AC65FCE136F8:0000000000000000:E585C051FA7A063C:E584C051FA7A063C
May 22 13:43:50 slave kernel: [  154.630789] block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) 
May 22 13:43:50 slave kernel: [  154.643171] block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1
May 22 13:43:50 slave kernel: [  154.645045] block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 exit code 0 (0x0)
May 22 13:43:56 slave kernel: [  161.221453] drbd www: Starting worker thread (from drbdsetup-84 [1982])
May 22 13:43:56 slave kernel: [  161.221931] block drbd4: disk( Diskless -> Attaching ) 
May 22 13:43:56 slave kernel: [  161.222152] drbd www: Method to ensure write ordering: flush
May 22 13:43:56 slave kernel: [  161.222212] block drbd4: max BIO size = 1048576
May 22 13:43:56 slave kernel: [  161.222271] block drbd4: drbd_bm_resize called with capacity == 2470145208
May 22 13:43:56 slave kernel: [  161.227752] block drbd4: resync bitmap: bits=308768151 words=4824503 pages=9423
May 22 13:43:56 slave kernel: [  161.227831] block drbd4: size = 1178 GB (1235072604 KB)
May 22 13:43:57 slave kernel: [  161.752803] block drbd4: recounting of set bits took additional 2 jiffies
May 22 13:43:57 slave kernel: [  161.752872] block drbd4: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
May 22 13:43:57 slave kernel: [  161.752940] block drbd4: disk( Attaching -> UpToDate ) 
May 22 13:43:57 slave kernel: [  161.753013] block drbd4: attached to UUIDs 898D29B4DD6B5708:0000000000000000:5AD29C0767BA4364:5AD19C0767BA4365
May 22 13:43:57 slave kernel: [  161.815335] drbd www: conn( StandAlone -> Unconnected ) 
May 22 13:43:57 slave kernel: [  161.815423] drbd www: Starting receiver thread (from drbd_w_www [1983])
May 22 13:43:57 slave kernel: [  161.815574] drbd www: receiver (re)started
May 22 13:43:57 slave kernel: [  161.815668] drbd www: conn( Unconnected -> WFConnection ) 
May 22 13:45:22 slave kernel: [  247.223304] drbd daten1: sock_recvmsg returned -11
May 22 13:45:22 slave kernel: [  247.223401] drbd daten1: conn( WFConnection -> BrokenPipe ) 
May 22 13:45:22 slave kernel: [  247.223464] drbd daten1: short read (expected size 8)
May 22 13:45:22 slave kernel: [  247.255411] drbd daten1: Connection closed
May 22 13:45:22 slave kernel: [  247.255503] drbd daten1: conn( BrokenPipe -> Unconnected ) 
May 22 13:45:23 slave kernel: [  248.279309] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:45:37 slave kernel: [  262.326956] drbd daten1: sock_recvmsg returned -11
May 22 13:45:37 slave kernel: [  262.327052] drbd daten1: conn( WFConnection -> BrokenPipe ) 
May 22 13:45:37 slave kernel: [  262.327114] drbd daten1: short read (expected size 8)
May 22 13:45:37 slave kernel: [  262.367093] drbd daten1: Connection closed
May 22 13:45:37 slave kernel: [  262.367185] drbd daten1: conn( BrokenPipe -> Unconnected ) 
May 22 13:45:38 slave kernel: [  263.382970] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:45:50 slave kernel: [  274.614677] drbd daten1: sock_recvmsg returned -11
May 22 13:45:50 slave kernel: [  274.614772] drbd daten1: conn( WFConnection -> BrokenPipe ) 
May 22 13:45:50 slave kernel: [  274.614835] drbd daten1: short read (expected size 8)
May 22 13:45:50 slave kernel: [  274.646745] drbd daten1: Connection closed
May 22 13:45:50 slave kernel: [  274.646837] drbd daten1: conn( BrokenPipe -> Unconnected ) 
May 22 13:45:51 slave kernel: [  275.670710] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:46:05 slave kernel: [  289.718319] drbd daten1: sock_recvmsg returned -11
May 22 13:46:05 slave kernel: [  289.718414] drbd daten1: conn( WFConnection -> BrokenPipe ) 
May 22 13:46:05 slave kernel: [  289.718476] drbd daten1: short read (expected size 8)
May 22 13:46:05 slave kernel: [  289.762447] drbd daten1: Connection closed
May 22 13:46:05 slave kernel: [  289.762549] drbd daten1: conn( BrokenPipe -> Unconnected ) 
May 22 13:46:06 slave kernel: [  290.774356] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:46:17 slave kernel: [  302.006052] drbd daten1: sock_recvmsg returned -11
May 22 13:46:17 slave kernel: [  302.006148] drbd daten1: conn( WFConnection -> BrokenPipe ) 
May 22 13:46:17 slave kernel: [  302.006210] drbd daten1: short read (expected size 8)
May 22 13:46:17 slave kernel: [  302.038139] drbd daten1: Connection closed
May 22 13:46:17 slave kernel: [  302.038233] drbd daten1: conn( BrokenPipe -> Unconnected ) 
May 22 13:46:18 slave kernel: [  303.062060] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:46:32 slave kernel: [  317.109708] drbd daten1: sock_recvmsg returned -11
May 22 13:46:32 slave kernel: [  317.109804] drbd daten1: conn( WFConnection -> BrokenPipe ) 
May 22 13:46:32 slave kernel: [  317.109867] drbd daten1: short read (expected size 8)
May 22 13:46:32 slave kernel: [  317.149879] drbd daten1: Connection closed
May 22 13:46:32 slave kernel: [  317.149974] drbd daten1: conn( BrokenPipe -> Unconnected ) 
May 22 13:46:33 slave kernel: [  318.165715] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:46:45 slave kernel: [  329.397430] drbd daten1: sock_recvmsg returned -11
May 22 13:46:45 slave kernel: [  329.397525] drbd daten1: conn( WFConnection -> BrokenPipe ) 
May 22 13:46:45 slave kernel: [  329.397588] drbd daten1: short read (expected size 8)
May 22 13:46:45 slave kernel: [  329.437544] drbd daten1: Connection closed
May 22 13:46:45 slave kernel: [  329.437636] drbd daten1: conn( BrokenPipe -> Unconnected ) 
May 22 13:46:46 slave kernel: [  330.453437] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:47:00 slave kernel: [  344.501088] drbd daten1: sock_recvmsg returned -11
May 22 13:47:00 slave kernel: [  344.501187] drbd daten1: conn( WFConnection -> BrokenPipe ) 
May 22 13:47:00 slave kernel: [  344.501250] drbd daten1: short read (expected size 8)
May 22 13:47:00 slave kernel: [  344.541221] drbd daten1: Connection closed
May 22 13:47:00 slave kernel: [  344.541314] drbd daten1: conn( BrokenPipe -> Unconnected ) 
May 22 13:47:01 slave kernel: [  345.557115] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:47:15 slave kernel: [  359.604741] drbd daten1: sock_recvmsg returned -11
May 22 13:47:15 slave kernel: [  359.604838] drbd daten1: conn( WFConnection -> BrokenPipe ) 
May 22 13:47:15 slave kernel: [  359.604901] drbd daten1: short read (expected size 8)
May 22 13:47:15 slave kernel: [  359.636878] drbd daten1: Connection closed
May 22 13:47:15 slave kernel: [  359.636967] drbd daten1: conn( BrokenPipe -> Unconnected ) 
May 22 13:47:16 slave kernel: [  360.660749] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:47:30 slave kernel: [  374.708398] drbd daten1: sock_recvmsg returned -11
May 22 13:47:30 slave kernel: [  374.708495] drbd daten1: conn( WFConnection -> BrokenPipe ) 
May 22 13:47:30 slave kernel: [  374.708558] drbd daten1: short read (expected size 8)
May 22 13:47:30 slave kernel: [  374.740537] drbd daten1: Connection closed
May 22 13:47:30 slave kernel: [  374.740629] drbd daten1: conn( BrokenPipe -> Unconnected ) 
May 22 13:47:31 slave kernel: [  375.764413] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:47:45 slave kernel: [  389.812071] drbd daten1: sock_recvmsg returned -11
May 22 13:47:45 slave kernel: [  389.812168] drbd daten1: conn( WFConnection -> BrokenPipe ) 
May 22 13:47:45 slave kernel: [  389.812230] drbd daten1: short read (expected size 8)
May 22 13:47:45 slave kernel: [  389.844141] drbd daten1: Connection closed
May 22 13:47:45 slave kernel: [  389.844233] drbd daten1: conn( BrokenPipe -> Unconnected ) 
May 22 13:47:46 slave kernel: [  390.868077] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:47:57 slave kernel: [  402.099771] drbd daten1: sock_recvmsg returned -11
May 22 13:47:57 slave kernel: [  402.099867] drbd daten1: conn( WFConnection -> BrokenPipe ) 
May 22 13:47:57 slave kernel: [  402.099930] drbd daten1: short read (expected size 8)
May 22 13:47:57 slave kernel: [  402.131886] drbd daten1: Connection closed
May 22 13:47:57 slave kernel: [  402.131980] drbd daten1: conn( BrokenPipe -> Unconnected ) 
May 22 13:47:58 slave kernel: [  403.155777] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:48:12 slave kernel: [  417.203431] drbd daten1: sock_recvmsg returned -11
May 22 13:48:12 slave kernel: [  417.203531] drbd daten1: conn( WFConnection -> BrokenPipe ) 
May 22 13:48:12 slave kernel: [  417.203593] drbd daten1: short read (expected size 8)
May 22 13:48:12 slave kernel: [  417.235534] drbd daten1: Connection closed
May 22 13:48:12 slave kernel: [  417.235628] drbd daten1: conn( BrokenPipe -> Unconnected ) 
May 22 13:48:13 slave kernel: [  418.259435] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:48:27 slave kernel: [  432.307084] drbd daten1: sock_recvmsg returned -11
May 22 13:48:27 slave kernel: [  432.307180] drbd daten1: conn( WFConnection -> BrokenPipe ) 
May 22 13:48:27 slave kernel: [  432.307243] drbd daten1: short read (expected size 8)
May 22 13:48:27 slave kernel: [  432.339196] drbd daten1: Connection closed
May 22 13:48:27 slave kernel: [  432.339288] drbd daten1: conn( BrokenPipe -> Unconnected ) 
May 22 13:48:28 slave kernel: [  433.363092] drbd daten1: conn( Unconnected -> WFConnection ) 
May 22 13:48:41 slave kernel: [  445.401671] drbd daten1: Handshake successful: Agreed network protocol version 101
May 22 13:48:41 slave kernel: [  445.401751] drbd daten1: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
May 22 13:48:41 slave kernel: [  445.401906] drbd daten1: conn( WFConnection -> WFReportParams ) 
May 22 13:48:41 slave kernel: [  445.401969] drbd daten1: Starting ack_recv thread (from drbd_r_daten1 [1756])
May 22 13:48:41 slave kernel: [  445.438807] block drbd2: drbd_sync_handshake:
May 22 13:48:41 slave kernel: [  445.438867] block drbd2: self 10A69C08215C0CFE:0000000000000000:F881C8F6F930412E:F880C8F6F930412F bits:0 flags:0
May 22 13:48:41 slave kernel: [  445.438944] block drbd2: peer 6BB6FCACF35A1FA9:10A69C08215C0CFF:F881C8F6F930412F:F880C8F6F930412F bits:0 flags:2
May 22 13:48:41 slave kernel: [  445.439020] block drbd2: uuid_compare()=-1 by rule 50
May 22 13:48:41 slave kernel: [  445.439775] block drbd2: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate ) 
May 22 13:48:41 slave kernel: [  445.498326] block drbd2: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
May 22 13:48:41 slave kernel: [  445.499589] block drbd2: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
May 22 13:48:41 slave kernel: [  445.499676] block drbd2: conn( WFBitMapT -> WFSyncUUID ) 
May 22 13:48:41 slave kernel: [  445.521692] block drbd2: updated sync uuid 10A79C08215C0CFE:0000000000000000:F881C8F6F930412E:F880C8F6F930412F
May 22 13:48:41 slave kernel: [  445.531650] block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2
May 22 13:48:41 slave kernel: [  445.533787] block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2 exit code 0 (0x0)
May 22 13:48:41 slave kernel: [  445.533892] block drbd2: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) 
May 22 13:48:41 slave kernel: [  445.533988] block drbd2: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
May 22 13:48:41 slave kernel: [  445.536047] block drbd2: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
May 22 13:48:41 slave kernel: [  445.536111] block drbd2: updated UUIDs 6BB6FCACF35A1FA8:0000000000000000:10A79C08215C0CFE:10A69C08215C0CFF
May 22 13:48:41 slave kernel: [  445.536189] block drbd2: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) 
May 22 13:48:41 slave kernel: [  445.548248] block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2
May 22 13:48:41 slave kernel: [  445.550201] block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2 exit code 0 (0x0)
May 22 13:59:21 slave kernel: [ 1086.372164] drbd opt: sock_recvmsg returned -11
May 22 13:59:21 slave kernel: [ 1086.372227] drbd opt: conn( WFConnection -> BrokenPipe ) 
May 22 13:59:21 slave kernel: [ 1086.372255] drbd opt: short read (expected size 8)
May 22 13:59:22 slave kernel: [ 1086.408351] drbd opt: Connection closed
May 22 13:59:22 slave kernel: [ 1086.408411] drbd opt: conn( BrokenPipe -> Unconnected ) 
May 22 13:59:23 slave kernel: [ 1087.428190] drbd opt: conn( Unconnected -> WFConnection ) 
May 22 13:59:37 slave kernel: [ 1101.475817] drbd opt: sock_recvmsg returned -11
May 22 13:59:37 slave kernel: [ 1101.475878] drbd opt: conn( WFConnection -> BrokenPipe ) 
May 22 13:59:37 slave kernel: [ 1101.475907] drbd opt: short read (expected size 8)
May 22 13:59:37 slave kernel: [ 1101.508027] drbd opt: Connection closed
May 22 13:59:37 slave kernel: [ 1101.508085] drbd opt: conn( BrokenPipe -> Unconnected ) 
May 22 13:59:38 slave kernel: [ 1102.531845] drbd opt: conn( Unconnected -> WFConnection ) 
May 22 13:59:49 slave kernel: [ 1113.763537] drbd opt: sock_recvmsg returned -11
May 22 13:59:49 slave kernel: [ 1113.763600] drbd opt: conn( WFConnection -> BrokenPipe ) 
May 22 13:59:49 slave kernel: [ 1113.763629] drbd opt: short read (expected size 8)
May 22 13:59:49 slave kernel: [ 1113.795750] drbd opt: Connection closed
May 22 13:59:49 slave kernel: [ 1113.795808] drbd opt: conn( BrokenPipe -> Unconnected ) 
May 22 13:59:50 slave kernel: [ 1114.819543] drbd opt: conn( Unconnected -> WFConnection ) 
May 22 14:00:04 slave kernel: [ 1128.867192] drbd opt: sock_recvmsg returned -11
May 22 14:00:04 slave kernel: [ 1128.867253] drbd opt: conn( WFConnection -> BrokenPipe ) 
May 22 14:00:04 slave kernel: [ 1128.867282] drbd opt: short read (expected size 8)
May 22 14:00:04 slave kernel: [ 1128.907401] drbd opt: Connection closed
May 22 14:00:04 slave kernel: [ 1128.907461] drbd opt: conn( BrokenPipe -> Unconnected ) 
May 22 14:00:05 slave kernel: [ 1129.923226] drbd opt: conn( Unconnected -> WFConnection ) 
May 22 14:00:19 slave kernel: [ 1143.970851] drbd opt: sock_recvmsg returned -11
May 22 14:00:19 slave kernel: [ 1143.970912] drbd opt: conn( WFConnection -> BrokenPipe ) 
May 22 14:00:19 slave kernel: [ 1143.970941] drbd opt: short read (expected size 8)
May 22 14:00:19 slave kernel: [ 1144.003059] drbd opt: Connection closed
May 22 14:00:19 slave kernel: [ 1144.003118] drbd opt: conn( BrokenPipe -> Unconnected ) 
May 22 14:00:20 slave kernel: [ 1145.026855] drbd opt: conn( Unconnected -> WFConnection ) 
May 22 14:00:31 slave kernel: [ 1156.258566] drbd opt: sock_recvmsg returned -11
May 22 14:00:31 slave kernel: [ 1156.258628] drbd opt: conn( WFConnection -> BrokenPipe ) 
May 22 14:00:31 slave kernel: [ 1156.258656] drbd opt: short read (expected size 8)
May 22 14:00:31 slave kernel: [ 1156.290779] drbd opt: Connection closed
May 22 14:00:31 slave kernel: [ 1156.290837] drbd opt: conn( BrokenPipe -> Unconnected ) 
May 22 14:00:32 slave kernel: [ 1157.314572] drbd opt: conn( Unconnected -> WFConnection ) 
May 22 14:00:44 slave kernel: [ 1168.546288] drbd opt: sock_recvmsg returned -11
May 22 14:00:44 slave kernel: [ 1168.546352] drbd opt: conn( WFConnection -> BrokenPipe ) 
May 22 14:00:44 slave kernel: [ 1168.546381] drbd opt: short read (expected size 8)
May 22 14:00:44 slave kernel: [ 1168.578497] drbd opt: Connection closed
May 22 14:00:44 slave kernel: [ 1168.578555] drbd opt: conn( BrokenPipe -> Unconnected ) 
May 22 14:00:45 slave kernel: [ 1169.602293] drbd opt: conn( Unconnected -> WFConnection ) 
May 22 14:00:56 slave kernel: [ 1180.834006] drbd opt: sock_recvmsg returned -11
May 22 14:00:56 slave kernel: [ 1180.834069] drbd opt: conn( WFConnection -> BrokenPipe ) 
May 22 14:00:56 slave kernel: [ 1180.834098] drbd opt: short read (expected size 8)
May 22 14:00:56 slave kernel: [ 1180.866130] drbd opt: Connection closed
May 22 14:00:56 slave kernel: [ 1180.866187] drbd opt: conn( BrokenPipe -> Unconnected ) 
May 22 14:00:57 slave kernel: [ 1181.890034] drbd opt: conn( Unconnected -> WFConnection ) 
Die Zeile:

May 22 13:41:25 slave CRON[666]: (root) CMD ( /usr/local/sbin/drbd-slave-boot)

ruft das Script auf, welches die Resourcen einhängt:

Code: Alles auswählen

#!/bin/bash

# Zeit, bis das Script gestartet wird
sleep 60s

drbdadm up home
sleep 30s

drbdadm up daten1
sleep 30s

drbdadm up opt
sleep 30s

drbdadm up www

exit 0
Das ist beim master deaktiviert.

Und auch noch mal zur besseren Übersicht:
  • drbd1 = home
    drbd2 = daten1
    drbd3 = opt
    drbd4 = www
Zuletzt geändert von Exxter am 22.05.2017 14:32:27, insgesamt 1-mal geändert.

Benutzeravatar
heisenberg
Beiträge: 3548
Registriert: 04.06.2015 01:17:27
Lizenz eigener Beiträge: MIT Lizenz

Re: drbd verbindet sich nicht zuverlässig nach reboot

Beitrag von heisenberg » 22.05.2017 14:32:14

Jede Rohheit hat ihren Ursprung in einer Schwäche.

Exxter
Beiträge: 383
Registriert: 10.01.2003 00:15:15
Lizenz eigener Beiträge: GNU General Public License

Re: drbd verbindet sich nicht zuverlässig nach reboot

Beitrag von Exxter » 22.05.2017 15:19:25

Danke. Ich habe jetzt mal auf beiden Servern für jede Resource:

drbdadm net-options --ping-timeout=10 <resource>

Das wars, vielen Dank! :D

Benutzeravatar
heisenberg
Beiträge: 3548
Registriert: 04.06.2015 01:17:27
Lizenz eigener Beiträge: MIT Lizenz

Re: drbd verbindet sich nicht zuverlässig nach reboot

Beitrag von heisenberg » 23.05.2017 07:49:10

Also verstanden habe ich das Problem und die Lösung nicht.

Im Übrigen sollte das eine Einstellung sein, die Du entweder global oder pro Ressource in der jeweiligen Konfigurationsdatei permanent setzen kannst.
Jede Rohheit hat ihren Ursprung in einer Schwäche.

Exxter
Beiträge: 383
Registriert: 10.01.2003 00:15:15
Lizenz eigener Beiträge: GNU General Public License

Re: drbd verbindet sich nicht zuverlässig nach reboot

Beitrag von Exxter » 24.05.2017 10:31:55

Hm, auf jeden Fall funktioniert es seitdem zuverlässig.

Antworten