µÚ5Õ¡þ»ùÓÚÉî¶ÈÐÅÄîÍøÂçµÄÖá³Ð¹ÊÕÏÕï¶Ï·½·¨ ×÷ΪµÚ¶þÀàÉî¶Èѧϰ·½·¨¡ª¡ªÉî¶ÈÐÅÄîÍøÂ磬ÊÇÒ»ÖÖ²ÉÓÃÖð²ãѵÁ··½Ê½ÑµÁ·µÄÉî¶ÈÉñ¾­ÍøÂç¡£±¾ÕÂÖ÷ÒªÌá³öÒ»ÖÖ»ùÓÚÉî¶ÈÐÅÄîÍøÂçµÄÖá³Ð¹ÊÕÏÕï¶Ï·½·¨£¬¾ßÌå°üÀ¨»ùÓÚÉî¶ÈÐÅÄîÍøÂçµÄÖá³Ð¹ÊÕÏÕï¶ÏÍøÂç½á¹¹Óë¹ÊÕÏÕï¶Ï½¨Ä£»úÀí£» »ùÓÚÉî¶ÈÐÅÄîÍøÂçµÄÖá³Ð¹ÊÕÏÕï¶ÏÄ£Ð͹¹½¨Á÷³ÌÓë¹¹½¨Ëã·¨£» ½áºÏÃÀ¹ú¿­Ë¹Î÷´¢´óѧ(CWRU)ÌṩµÄÖá³ÐÊý¾Ý¼¯£¬Íê³É»ùÓÚÉî¶ÈÐÅÄîÍøÂçµÄÖá³Ð¹ÊÕÏÕï¶ÏÄ£ÐÍʵÑé¡£ 5.1»ùÓÚÉî¶ÈÐÅÄîÍøÂç¹ÊÕÏÕï¶Ï¹¤×÷Ô­Àí 5.1.1»ùÓÚÉî¶ÈÐÅÄîÍøÂçµÄÖá³Ð¹ÊÕÏÕï¶ÏÍøÂç½á¹¹ »ùÓÚÉî¶ÈÐÅÄîÍøÂçDBNµÄÖá³Ð¹ÊÕÏÕï¶ÏÍøÂç½á¹¹Èçͼ5.1Ëùʾ£¬ËüÊǾßÓÐÐí¶àÒþ²Ø²ãµÄǰÀ¡Éñ¾­ÍøÂ磬ÕâЩÒþ²Ø²ã¿ÉÒÔʹDBN¶ÔÊý¾ÝÖеĸ´ÔÓ¹ØÏµ¾ßÓиüÇ¿´óµÄ½¨Ä£ÄÜÁ¦¡£ ͼ5.1»ùÓÚDBNµÄÖá³Ð¹ÊÕÏÕï¶ÏÍøÂç½á¹¹ »ùÓÚDBNµÄÖá³Ð¹ÊÕÏÕï¶ÏÍøÂç½á¹¹µÄÊäÈë²ã¶ÔÓ¦Öá³ÐÕñ¶¯ÐźŵÄÊäÈ룬ÖмäÊǶà¸öÊÜÏÞ²£¶û×ÈÂü»úRBM¹¹³ÉµÄÒþ²Ø²ã£¬Êä³ö²ã¶ÔÓ¦¹ÊÕÏÕï¶Ï·ÖÀà½á¹ûÊä³ö¡£´ÓÊäÈë²ãµ½Êä³ö·ÖÀà²ã£¬Ã¿Á½¸öÏàÁÚ²ãÐγÉÒ»¸öRBM¡£ 5.1.2»ùÓÚÉî¶ÈÐÅÄîÍøÂçµÄÖá³Ð¹ÊÕÏÕï¶Ï½¨Ä£»úÀí »ùÓÚDBNµÄÖá³Ð¹ÊÕÏÕï¶Ï½¨Ä£¹ý³Ì°üÀ¨DBNÎ޼ලѧϰºÍDBNÓмල·´Ïòµ÷²ÎÁ½¸ö½×¶Î¡£ 1. DBNÎ޼ලѧϰѵÁ·¹ý³Ì ÔÚÕâ¸öѵÁ·½×¶Î£¬Í¨¹ýÒ»¸ö·Ç¼à¶½Ì°À·Öð²ãѵÁ··½·¨È¥Ô¤ÑµÁ·Ä£Ð͵ÄȨֵ¡£Ê×ÏȰÑÊý¾ÝÏòÁ¿xºÍµÚÒ»²ãÒþ²Ø²ãh_1×÷Ϊһ¸öRBM,ʹÓÃCDª²KË㷨ѵÁ·³öÕâ¸öRBMµÄ²ÎÊý(Á¬½ÓxºÍh_1µÄÈ¨ÖØ£¬xºÍh_1¸÷¸ö½ÚµãµÄÆ«ÖõÈ)£¬È»ºó¹Ì¶¨Õâ¸öRBMµÄ²ÎÊý£¬°Ñh_1ÊÓ×÷¿É¼ûÏòÁ¿£¬°Ñh_2ÊÓ×÷Òþ²ØÏòÁ¿£¬ÑµÁ·µÚ¶þ¸öRBM£¬µÃµ½Æä²ÎÊý£¬È»ºó¹Ì¶¨ÕâЩ²ÎÊý¡£½Ó×ÅѵÁ·¸ü¸ß²ãµÄRBM¡£ RBMÊÇÁ½²ãÉñ¾­ÍøÂç¡£Êý¾ÝÊäÈë²ã»ò¿É¼û²ãÓɿɼûµ¥Ôªÿðþ½›=(v1,v2,v3,¡­,vn)×é³É£¬Òþ²Ø²ãÓÉÒþ²Øµ¥Ôªh=(h1,h2,h3,¡­,hm)×é³É£¬Í¨¹ýÈ¨ÖØ¾ØÕówn*mÁ¬½Óÿ¸ö¿É¼ûµ¥ÔªºÍÿ¸öÒþ²Øµ¥Ôª¡£ ¿É¼ûµ¥ÔªºÍÒþ²Øµ¥ÔªµÄÁªºÏ·Ö²¼·ûºÏ²£¶û×ÈÂü·Ö²¼£¬Í¨¹ýÄÜÁ¿º¯ÊýE(V,H)µÄ¸ÅÂÊÃܶÈΪ P(V,H)=1Ze-E(V,H)(5.1) Z=¡ÆV,He-E(V,H)(5.2) ÆäÄÜÁ¿º¯ÊýE(V,H)Ϊ E(V,H)=-¡Æni=1aivi-¡Æmj=1bjhj-¡Æni=1¡Æmj=1viwijhj(5.3) ÆäÖÐviºÍhj·Ö±ð±íʾ¿É¼û²ãÖеĵÚi¸öÉñ¾­ÔªºÍÒþ²Ø²ãÖеĵÚj¸öÉñ¾­ÔªµÄ״̬£¬ÆäÖÐaiºÍ bjÊÇËüÃÇµÄÆ«²î£¬¶øwij=wji ÊǵÚi¸öÉñ¾­ÔªºÍµÚj¸öÉñ¾­ÔªÖ®¼äµÄË«ÏòÈ¨ÖØ¡£ ÔÚÒ»¸öRBMÖУ¬Òþ²ØÉñ¾­Ôªhj±»¼¤»îµÄ¸ÅÂÊÈçÏ£º P(hj|v)=¦Ò(aj+¡Æni=1wijvi)(5.4) ÔÚÒ»¸öRBMÖУ¬Òþ²ØÉñ¾­Ôªvi±»¼¤»îµÄ¸ÅÂÊÈçÏ£º P(vi|h)=¦Ò(bi+¡Æmj=1wijhj)(5.5) ÆäÖУ¬¦ÒΪSigmoidº¯Êý£º ¦Ò(x)=1/(1+e-x)£¬Ò²¿ÉÒÔÉ趨ΪÆäËûº¯Êý¡£ RBMµÄѵÁ·¹ý³Ìʵ¼ÊÉÏÊÇÇó³öÒ»¸ö×îÄܲúÉúѵÁ·Ñù±¾µÄ¸ÅÂÊ·Ö²¼¡£Ò²¾ÍÊÇ˵£¬ÒªÇóÒ»¸ö·Ö²¼£¬ÔÚÕâ¸ö·Ö²¼ÀѵÁ·Ñù±¾µÄ¸ÅÂÊ×î´ó¡£ÓÉÓÚÕâ¸ö·Ö²¼µÄ¾ö¶¨ÐÔÒòËØÔÚÓÚ²ÎÊýw¡¢b£¬ËùÒÔÎÒÃÇѵÁ·RBMµÄÄ¿±ê¾ÍÊÇѰÕÒ×î¼ÑµÄȨֵ¡£ RBMµÄȨֵÁ¬½ÓÊǵ¥ÏòµÄ£¬Òò´ËRBMÎÞÐè¼à¶½¼´¿É´ÓÊäÈëÊý¾ÝÖÐÑ§Ï°ÌØÕ÷ÐÅÏ¢¡£ÑµÁ·µÄ¹ý³ÌÊÇͨ¹ýµ÷ÕûÈ¨ÖØºÍÆ«²îÀ´Ôö¼ÓµÃµ½ÊäÈëÊý¾ÝP(V)µÄ¿ÉÄÜÐÔ¡£¶Ô±ÈÉ¢¶ÈËã·¨(Contrastive Divergence£¬CD)ÊÇѵÁ·RBMµÄÒ»¸ö¿ìËÙѧϰËã·¨¡£ËüÖ»ÐèÒªk²½Gibbs(¼ª²¼Ë¹)²ÉÑù(¼ò¼ÇΪCDª²K)¾Í¿ÉÒԵõ½Ò»¸ö×ã¹»ºÃµÄ½üËÆÄ£ÐÍ£¬ÉõÖÁµ±K=1ʱ¾Í¿ÉÒÔ´ïµ½ºÜºÃµÄѵÁ·Ð§¹û¡£ CDª²KË㷨ѵÁ·¹ý³ÌΪ£¬ÔÚ¿ÉÊÓ²ã»á²úÉúÒ»¸öÏòÁ¿ÿðþ½›£¬Í¨¹ýËü½«Öµ´«µÝµ½Òþ²Ø²ã¡£·´¹ýÀ´£¬¿ÉÊÓ²ãµÄÊäÈë»á±»Ëæ»úµÄÑ¡Ôñ£¬ÒÔ³¢ÊÔÈ¥ÖØ¹¹Ô­Ê¼µÄÊäÈëÐźš£×îºó£¬ÕâЩеĿÉÊÓµÄÉñ¾­Ôª¼¤»îµ¥Ôª½«Ç°Ïò´«µÝÖØ¹¹Òþ²Ø²ã¼¤»îµ¥Ôª£¬»ñµÃh(ÔÚѵÁ·¹ý³ÌÖУ¬Ê×ÏȽ«¿ÉÊÓÏòÁ¿ÖµÓ³É䏸Òþµ¥Ôª£» È»ºó¿ÉÊÓµ¥ÔªÓÉÒþ²Ø²ãµ¥ÔªÖؽ¨£» ÕâЩпÉÊÓµ¥ÔªÔÙ´ÎÓ³É䏸Òþµ¥Ôª£¬ÕâÑù¾Í»ñȡеÄÒþµ¥Ôª¡£Ö´ÐÐÕâÖÖ·´¸´²½Öè³ÆÎªGibbs²ÉÑù)¡£¶øÒþ²Ø²ã¼¤»îµ¥ÔªºÍ¿ÉÊÓ²ãÊäÈëÖ®¼äµÄÏà¹ØÐÔ²î±ð¾Í×÷ΪȨֵ¸üеÄÖ÷ÒªÒÀ¾Ý¡£¸÷¸ö²ÎÊý¸üвßÂԵķ½·¨£¬Ê¹ÓÃÈçϵÄʽ×Ó£º wepoch+1ij=wepochij+¦ÁCD(5.6) aepoch+1i=aepochi+¦Á1m¡Æmi=1(hepoch-1i-hepochi)(5.7) bepoch+1i=bepochi+¦Á1n¡Æni=1(vepoch-1i-vepochi)(5.8) ÆäÖУ¬¦ÁΪѧϰÂÊ¡£¶Ô±ÈÉ¢¶ÈCD=-,< >±íʾ·Ö²¼µÄƽ¾ù¡£ 2. DBNÓмල·´Ïòµ÷²ÎѵÁ·¹ý³Ì ÔÚÎ޼ලѧϰ֮ºó£¬¿ÉÒÔ»ñµÃÌØÕ÷Ä£ÐÍ¡£ÔÚDBNµÄ×îºóÒ»²ãÉèÖÃBPÍøÂ磬½ÓÊÕRBMµÄÊä³öÌØÕ÷ÏòÁ¿×÷ΪËüµÄÊäÈëÌØÕ÷ÏòÁ¿£¬ÓмලµØÑµÁ·ÊµÌå¹ØÏµ·ÖÀàÆ÷¡£¶øÇÒÿ²ãRBMÍøÂçÖ»ÄÜÈ·±£×ÔÉí²ãÄÚµÄȨֵ¶Ô¸Ã²ãÌØÕ÷ÏòÁ¿Ó³Éä´ïµ½×îÓÅ£¬²¢²»ÊǶÔÕû¸öDBNµÄÌØÕ÷ÏòÁ¿Ó³Éä´ïµ½×îÓÅ£¬ËùÒÔ·´Ïò´«²¥ÍøÂ绹½«´íÎóÐÅÏ¢×Ô¶¥ÏòÏ´«²¥ÖÁÿ²ãRBM£¬Î¢µ÷Õû¸öDBNÍøÂç¡£RBMÍøÂçѵÁ·Ä£Ð͵Ĺý³Ì¿ÉÒÔ¿´×÷¶ÔÒ»¸öÉî²ãBPÍøÂçȨֵ²ÎÊýµÄ³õʼ»¯£¬Ê¹DBN¿Ë·þÁËBPÍøÂçÒòËæ»ú³õʼ»¯È¨Öµ²ÎÊý¶øÈÝÒ×ÏÝÈë¾Ö²¿×îÓźÍѵÁ·Ê±¼ä³¤µÄȱµã¡£ ͨ³£ÓÃËðʧº¯ÊýL(Y,f(x))À´ÆÀ¹ÀÄ£Ðͺûµ³Ì¶È£¬¼´Ô¤²âÖµf(x)ÓëÕæÊµÖµµÄ²»Ò»Ö³̶ȣ¬Ëðʧº¯ÊýµÄֵԽС£¬Ä£Ð͵ij°ôÐÔÒ²¾ÍÔ½ºÃ£¬¶ÔÐÂÊý¾ÝµÄÔ¤²âÄÜÁ¦Ò²¾ÍԽǿ¡£´«Í³µÄËðʧº¯ÊýÓУº ¾ù·½Îó²îËðʧº¯Êý(Mean Squared Error Loss Function)¡¢½»²æìØËðʧº¯Êý(Crossª²Entropy Loss Function)µÈ¡£ ¾ù·½Îó²îËðʧº¯ÊýµÄ»ù±¾ÐÎʽÈçÏ£º L(Y,f(x))=1DN¡ÆDNi=1(hNi-Yi)2(5.9) ½»²æìØËðʧº¯ÊýµÄ»ù±¾ÐÎʽÈçÏ£º L(Y,f(x))=-1DN¡ÆDNi=1YilnhNi(5.10) ÆäÖУ¬YÊÇÄ¿±ê£¬DNÊÇÊä³ö²ãµÄ½Úµã¸öÊý£¬hNÊÇÄ£Ð͵ÄÊä³ö¡£ µ±Ê¹ÓÃSigmoid×÷Ϊ¼¤»îº¯Êýʱ£¬³£Óý»²æìØËðʧº¯Êý¶ø²»Óþù·½Îó²îËðʧº¯Êý£¬ÒòΪËü¿ÉÒÔÍêÃÀ½â¾öƽ·½Ëðʧº¯ÊýÈ¨ÖØ¸üйýÂýµÄÎÊÌ⣬¾ßÓС°Îó²î´óµÄʱºò£¬È¨Öظüп죻 Îó²îСµÄʱºò£¬È¨ÖظüÐÂÂý¡±µÄÁ¼ºÃÐÔÖÊ¡£ ĿǰµÄDBNÄ£ÐÍʹÓûùÓÚÌݶȵÄѧϰ·½·¨½øÐÐÄ£Ð͵ÄѵÁ·£¬¶øÔÚѵÁ·¹ý³ÌÖÐÒ²Ö÷ÒªÊÇͨ¹ý¼ÆËãËðʧº¯ÊýµÄÌݶÈÀ´¸üÐÂÍøÂç²ÎÊý¡£³£ÓõÄÌݶȵÄѧϰ·½·¨ÓУº Ëæ»úÌݶÈϽµ·¨(Stochastic Gradient Descent£¬SGD)¡¢Momentum±ê×¼¶¯Á¿ÓÅ»¯·½·¨¡¢NAG(Nesterov Accelerated Gradient)Å£¶Ù¼ÓËÙÌݶȶ¯Á¿ÓÅ»¯·½·¨¡¢Adam×ÔÊÊӦѧϰÂÊÓÅ»¯Ëã·¨µÈ¡£ Ëæ»úÌݶÈϽµ·¨(SGD)£º ¸üÐÂÄ£ÐͲÎÊýµÄ±í´ïʽÈçÏ£º Wt+1=Wt-¦Çtgt(5.11) ÆäÖУ¬Ñ§Ï°ÂÊΪ¦Çt£» gt=«ýL(Wt£» X(is)£» Y(is)),±íʾһ¸öËæ»úµÄÌݶȷ½Ïò£¬is¡Ê{1,2,¡­,n}£¬X(is)´ÓѵÁ·¼¯ÖÐÈ¡³öµÄÒ»¸ö´óСΪnµÄСÅúÁ¿{X(1),X(2),¡­,X(n)}Ñù±¾£¬¶ÔÓ¦µÄÕæÊµÖµ·Ö±ðΪY(is)£» Wt±íʾtʱ¿ÌµÄÄ£ÐͲÎÊý¡£ Momentum±ê×¼¶¯Á¿ÓÅ»¯·½·¨£º ¸üÐÂÄ£ÐͲÎÊýµÄ±í´ïʽÈçÏ£º vt=¦Ávt-1+¦Çt«ýL(Wt£» X(is)£» Y(is))(5.12) Wt+1=Wt-vt(5.13) ÆäÖУ¬vt±íʾtʱ¿Ì»ýÔܵÄËÙ¶È£» ¦Á±íʾ¶¯Á¦µÄ´óС£¬Ò»°ãȡֵΪ0.9£» ¦¤L(Wt£» X(is)£» Y(is))º¬ÒåͬSGDËã·¨£» Wt±íʾtʱ¿ÌÄ£ÐͲÎÊý¡£ NAGÅ£¶Ù¼ÓËÙÌݶȶ¯Á¿ÓÅ»¯·½·¨£º ¸üÐÂÄ£ÐͲÎÊýµÄ±í´ïʽÈçÏ£º vt=¦Ávt-1+¦Çt«ýL(Wt-¦Ávt-1)(5.14) Wt+1=Wt-vt(5.15) ÆäÖУ¬vt±íʾtʱ¿Ì»ýÔܵÄËÙ¶È£» ¦Á±íʾ¶¯Á¦µÄ´óС£» ¦Çt±íʾѧϰÂÊ£» Wt±íʾtʱ¿ÌµÄÄ£ÐͲÎÊý£» «ýL(Wt-¦Ávt-1)±íʾËðʧº¯Êý¹ØÓÚWtµÄÌݶȡ£ 5.2»ùÓÚÉî¶ÈÐÅÄîÍøÂçµÄÖá³Ð¹ÊÕÏÕï¶ÏÄ£Ð͹¹½¨ 5.2.1»ùÓÚÉî¶ÈÐÅÄîÍøÂçµÄÖá³Ð¹ÊÕÏÕï¶ÏÄ£Ð͹¹½¨Á÷³Ì »ùÓÚÉî¶ÈÐÅÄîÍøÂçDBNµÄÖá³Ð¹ÊÕÏÕï¶ÏÄ£Ð͹¹½¨Á÷³ÌÈçͼ5.2Ëùʾ¡£ ͼ5.2Öá³Ð¹ÊÕÏÕï¶ÏµÄDBNÄ£ÐÍÁ÷³Ìͼ 5.2.2»ùÓÚÉî¶ÈÐÅÄîÍøÂçµÄÖá³Ð¹ÊÕÏÕï¶ÏÄ£Ð͹¹½¨Ëã·¨ »ùÓÚÉî¶ÈÐÅÄîÍøÂçDBNµÄÖá³Ð¹ÊÕÏÕï¶ÏÄ£Ð͹¹½¨Ëã·¨ÈçÏÂËùʾ£¬Õû¸öËã·¨°üÀ¨Ê¹ÓÃ̰ÐÄÎ޼ල·½·¨ºÍÖð²ã»ùÓÚ·´Ïò´«²¥µÄ¼à¶½Ñ§Ï°Á½²¿·Ö×é³É¡£ Ëã·¨5.1»ùÓÚDBNµÄÖá³Ð¹ÊÕÏÕï¶ÏÄ£Ðͽ¨Ä£Ëã·¨ ÊäÈ룺 Öá³ÐÕñ¶¯ÐźÅÊý¾Ý¼¯X Òþ²Ø²ãh£¬²ãÊýN£¬µü´ú´ÎÊýQ1£¬Q2 ²ÎÊý¿Õ¼äw£¬Æ«ÖÃa¡¢b£¬Ñ§Ï°ÂʦÁ ±ê×¢Êý¾ÝµÄ¸öÊýL£¬Î´±ê×¢Êý¾ÝµÄ¸öÊýU Êä³ö£º °üº¬ÑµÁ·ºó²ÎÊý¿Õ¼äµÄDBNÍøÂç ·½·¨£º ʹÓÃ̰À·Î޼ල·½·¨Öð²ã¹¹½¨ÍøÂç for k=1;k