吴恩达机器学习二——回归问题

吴恩达机器学习——回归问题

问题分析

特征的选取
  • 不一定非要使用原有的特征,可以将多个关联的特征合并为一个

  • 当进行多项式拟合的时候,可以根据选定的拟合多项式选择和构造特征

    • hθ(x)=θ0+θ1x1+θ2x2+θ3x3=θ0+θ1(size)+θ2(size)2+θ3(size)3\begin{array}{ll}h_\theta(x)&=\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3\\&=\theta_0+\theta_1(size)+\theta_2(size)^2+\theta_3(size)^3\end{array}

    • x1=(size)x2=(size)2x3=(size)3\begin{array}{ll}x_1&=(size)\\x_2&=(size)^2\\x_3&=(size)^3\end{array}为选定的三个特征

线性回归模型

线性回归问题
  • 相关概念

    • 训练集:用于学习的数据集
    • x:输入变量或特性
    • y:输出变量或目标变量
    • m:训练集包含的训练实例的个数,一个(x,y)表示一个训练实例。
  • 运行模式

    学习算法根据训练集得出一个映射关系hh将输入x映射为输出y

    h的模式表示为hθ(x)=θ0+θ1xh_\theta(x)=\theta_0+\theta_1x,是一元线性函数,因此称为线性回归算法。其中θi\theta_i称为模型参数。线性回归算法的要点就是选择θ0θ1\theta_0和\theta_1

    ![机器学习](%0AbyBub3QgZWRpdCB0aGlzIGZpbGUgd2l0aCBlZGl0b3JzIG90aGVyIHRoYW4g%0AZGlhZ3JhbXMubmV0IC0tPgo8IURPQ1RZUEUgc3ZnIFBVQkxJQyAiLS8vVzND%0ALy9EVEQgU1ZHIDEuMS8vRU4iICJodHRwOi8vd3d3LnczLm9yZy9HcmFwaGlj%0Acy9TVkcvMS4xL0RURC9zdmcxMS5kdGQiPgo8c3ZnIHhtbG5zPSJodHRwOi8v%0Ad3d3LnczLm9yZy8yMDAwL3N2ZyIgeG1sbnM6eGxpbms9Imh0dHA6Ly93d3cu%0AdzMub3JnLzE5OTkveGxpbmsiIHZlcnNpb249IjEuMSIgd2lkdGg9IjQ2MXB4%0AIiBoZWlnaHQ9IjI0MXB4IiB2aWV3Qm94PSItMC41IC0wLjUgNDYxIDI0MSIg%0AY29udGVudD0iJmx0O214ZmlsZSBob3N0PSZxdW90O0VsZWN0cm9uJnF1b3Q7%0AIG1vZGlmaWVkPSZxdW90OzIwMjItMDEtMTJUMTU6MDM6MDkuMDMzWiZxdW90%0AOyBhZ2VudD0mcXVvdDs1LjAgKFdpbmRvd3MgTlQgMTAuMDsgV2luNjQ7IHg2%0ANCkgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgZHJh%0Ady5pby8xNS40LjAgQ2hyb21lLzkxLjAuNDQ3Mi4xNjQgRWxlY3Ryb24vMTMu%0ANS4wIFNhZmFyaS81MzcuMzYmcXVvdDsgZXRhZz0mcXVvdDt6RHNXQzVpVHRn%0AN3VtVkJGdVRINCZxdW90OyB2ZXJzaW9uPSZxdW90OzE1LjQuMCZxdW90OyB0%0AeXBlPSZxdW90O2RldmljZSZxdW90OyZndDsmbHQ7ZGlhZ3JhbSBpZD0mcXVv%0AdDtPd083VmxLOGUwZzR6azA5aTRyVSZxdW90OyBuYW1lPSZxdW90O+esrCAx%0AIOmhtSZxdW90OyZndDs3WmZSa3BzZ0ZJYWY1bHp1aklvYXZOVEViVHZiVG1l%0AYWkxNHprVlU3S3BaZzFIMzZncUxHTVp0Tlo5cG0wOGxWNE9jUTREOThnSURX%0AZWZPQmt6TDV3aUthZ1dWRURhQU5XSmFKVFUvK0tLWHRGZHZBdlJEek5OSkJr%0AN0JOWDZnV0RhMVdhVVQzczBEQldDYlNjaTd1V0ZIUW5aaHBoSE5XejhPZVdU%0AWWZ0U1F4WFFqYkhjbVc2dmMwRWttdlltczE2UjlwR2lmRHlLYXJGNXlUSVZp%0AdlpKK1FpTlZIRWdvQnJUbGpvaS9selpwbXlyekJsNzdmNHl1dDQ4UTRMY1Fs%0ASGI0OTVVOVJ6V3p5MWZtNTJSZllMVjQrUGFEK1h3NGtxL1NDOVdSRk96aEFJ%0AMm1JcmpJdUVoYXpnbVRocEFhY1ZVVkUxVENHckUweG54a3JwV2hLOFFjVm90%0AWFpKWlZnVWtwRW51bldaMVlJM1doaVhYOGtlWnAxbXlaMHdBOEJCeERhRUd6%0AQVF6S2luNldhMnF0dWFHblBLcjZqWnl3WWRoWGhNUlZuNHF3eFozS3pVNVpU%0Ad1Z2Wmo5T01pUFF3bndmUnV5NGU0NmJFeUlMT3pXL2t5VnptS2NUS0ZsLzZz%0ANElnQUl3ZzlNQ1RCWGVSd2lsQnl1MDZTUVhkbHFSenBaYlkvcGxrSENnWHRE%0AbWZqcVY5UXdkSFU2S1BDYVNyOWNTY09ZQ1VIUEhtR24vSmNPY09oblVoR1BZ%0AMXdiQk9nT0VvS255M3M4VUQzMUNFU0tPOEZZUXVCQWc4WjVITE41aVlBL1F1%0AQ0JseHVCb2kzaDBSKzBKRThEVVJzVThnNG9LSE96SWNDQXpBdGlwZ1I4R2hi%0AaE1rTDVUa1AyQmtmRVZlalpIVm5SSDNGcTRSOStUN0tnaVZHejBhdnRQYzRM%0AdnF3WHB2Uk9BM25KYkhqOS9lNE5tRC9xSFRzanA5TjNadFIxL2ZLUHdGJmx0%0AOy9kaWFncmFtJmd0OyZsdDsvbXhmaWxlJmd0OyI+PGRlZnMvPjxnPjxwYXRo%0AIGQ9Ik0gMjMwIDYwIEwgMjMwIDgwIEwgMjMwIDcwIEwgMjMwIDgzLjYzIiBm%0AaWxsPSJub25lIiBzdHJva2U9IiMwMDAwMDAiIHN0cm9rZS1taXRlcmxpbWl0%0APSIxMCIgcG9pbnRlci1ldmVudHM9InN0cm9rZSIvPjxwYXRoIGQ9Ik0gMjMw%0AIDg4Ljg4IEwgMjI2LjUgODEuODggTCAyMzAgODMuNjMgTCAyMzMuNSA4MS44%0AOCBaIiBmaWxsPSIjMDAwMDAwIiBzdHJva2U9IiMwMDAwMDAiIHN0cm9rZS1t%0AaXRlcmxpbWl0PSIxMCIgcG9pbnRlci1ldmVudHM9ImFsbCIvPjxyZWN0IHg9%0AIjE3MCIgeT0iMCIgd2lkdGg9IjEyMCIgaGVpZ2h0PSI2MCIgcng9IjkiIHJ5%0APSI5IiBmaWxsPSIjZmZmZmZmIiBzdHJva2U9IiMwMDAwMDAiIHBvaW50ZXIt%0AZXZlbnRzPSJhbGwiLz48ZyB0cmFuc2Zvcm09InRyYW5zbGF0ZSgtMC41IC0w%0ALjUpIj48c3dpdGNoPjxmb3JlaWduT2JqZWN0IHBvaW50ZXItZXZlbnRzPSJu%0Ab25lIiB3aWR0aD0iMTAwJSIgaGVpZ2h0PSIxMDAlIiByZXF1aXJlZEZlYXR1%0AcmVzPSJodHRwOi8vd3d3LnczLm9yZy9UUi9TVkcxMS9mZWF0dXJlI0V4dGVu%0Ac2liaWxpdHkiIHN0eWxlPSJvdmVyZmxvdzogdmlzaWJsZTsgdGV4dC1hbGln%0AbjogbGVmdDsiPjxkaXYgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzE5OTkv%0AeGh0bWwiIHN0eWxlPSJkaXNwbGF5OiBmbGV4OyBhbGlnbi1pdGVtczogdW5z%0AYWZlIGNlbnRlcjsganVzdGlmeS1jb250ZW50OiB1bnNhZmUgY2VudGVyOyB3%0AaWR0aDogMTE4cHg7IGhlaWdodDogMXB4OyBwYWRkaW5nLXRvcDogMzBweDsg%0AbWFyZ2luLWxlZnQ6IDE3MXB4OyI+PGRpdiBzdHlsZT0iYm94LXNpemluZzog%0AYm9yZGVyLWJveDsgZm9udC1zaXplOiAwcHg7IHRleHQtYWxpZ246IGNlbnRl%0AcjsiPjxkaXYgc3R5bGU9ImRpc3BsYXk6IGlubGluZS1ibG9jazsgZm9udC1z%0AaXplOiAxOHB4OyBmb250LWZhbWlseTog5a6L5L2TOyBjb2xvcjogcmdiKDAs%0AIDAsIDApOyBsaW5lLWhlaWdodDogMS4yOyBwb2ludGVyLWV2ZW50czogYWxs%0AOyB3aGl0ZS1zcGFjZTogbm9ybWFsOyBvdmVyZmxvdy13cmFwOiBub3JtYWw7%0AIj7orq3nu4Ppm4Y8L2Rpdj48L2Rpdj48L2Rpdj48L2ZvcmVpZ25PYmplY3Q+%0APHRleHQgeD0iMjMwIiB5PSIzNSIgZmlsbD0iIzAwMDAwMCIgZm9udC1mYW1p%0AbHk9IuWui+S9kyIgZm9udC1zaXplPSIxOHB4IiB0ZXh0LWFuY2hvcj0ibWlk%0AZGxlIj7orq3nu4Ppm4Y8L3RleHQ+PC9zd2l0Y2g+PC9nPjxwYXRoIGQ9Ik0g%0AMjMwIDE1MCBMIDIzMCAxNzAgTCAyMzAgMTYwIEwgMjMwIDE3My42MyIgZmls%0AbD0ibm9uZSIgc3Ryb2tlPSIjMDAwMDAwIiBzdHJva2UtbWl0ZXJsaW1pdD0i%0AMTAiIHBvaW50ZXItZXZlbnRzPSJzdHJva2UiLz48cGF0aCBkPSJNIDIzMCAx%0ANzguODggTCAyMjYuNSAxNzEuODggTCAyMzAgMTczLjYzIEwgMjMzLjUgMTcx%0ALjg4IFoiIGZpbGw9IiMwMDAwMDAiIHN0cm9rZT0iIzAwMDAwMCIgc3Ryb2tl%0ALW1pdGVybGltaXQ9IjEwIiBwb2ludGVyLWV2ZW50cz0iYWxsIi8+PHJlY3Qg%0AeD0iMTcwIiB5PSI5MCIgd2lkdGg9IjEyMCIgaGVpZ2h0PSI2MCIgcng9Ijki%0AIHJ5PSI5IiBmaWxsPSIjZmZmZmZmIiBzdHJva2U9IiMwMDAwMDAiIHBvaW50%0AZXItZXZlbnRzPSJhbGwiLz48ZyB0cmFuc2Zvcm09InRyYW5zbGF0ZSgtMC41%0AIC0wLjUpIj48c3dpdGNoPjxmb3JlaWduT2JqZWN0IHBvaW50ZXItZXZlbnRz%0APSJub25lIiB3aWR0aD0iMTAwJSIgaGVpZ2h0PSIxMDAlIiByZXF1aXJlZEZl%0AYXR1cmVzPSJodHRwOi8vd3d3LnczLm9yZy9UUi9TVkcxMS9mZWF0dXJlI0V4%0AdGVuc2liaWxpdHkiIHN0eWxlPSJvdmVyZmxvdzogdmlzaWJsZTsgdGV4dC1h%0AbGlnbjogbGVmdDsiPjxkaXYgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzE5%0AOTkveGh0bWwiIHN0eWxlPSJkaXNwbGF5OiBmbGV4OyBhbGlnbi1pdGVtczog%0AdW5zYWZlIGNlbnRlcjsganVzdGlmeS1jb250ZW50OiB1bnNhZmUgY2VudGVy%0AOyB3aWR0aDogMTE4cHg7IGhlaWdodDogMXB4OyBwYWRkaW5nLXRvcDogMTIw%0AcHg7IG1hcmdpbi1sZWZ0OiAxNzFweDsiPjxkaXYgc3R5bGU9ImJveC1zaXpp%0Abmc6IGJvcmRlci1ib3g7IGZvbnQtc2l6ZTogMHB4OyB0ZXh0LWFsaWduOiBj%0AZW50ZXI7Ij48ZGl2IHN0eWxlPSJkaXNwbGF5OiBpbmxpbmUtYmxvY2s7IGZv%0AbnQtc2l6ZTogMThweDsgZm9udC1mYW1pbHk6IOWui+S9kzsgY29sb3I6IHJn%0AYigwLCAwLCAwKTsgbGluZS1oZWlnaHQ6IDEuMjsgcG9pbnRlci1ldmVudHM6%0AIGFsbDsgd2hpdGUtc3BhY2U6IG5vcm1hbDsgb3ZlcmZsb3ctd3JhcDogbm9y%0AbWFsOyI+5a2m5Lmg566X5rOVPC9kaXY+PC9kaXY+PC9kaXY+PC9mb3JlaWdu%0AT2JqZWN0Pjx0ZXh0IHg9IjIzMCIgeT0iMTI1IiBmaWxsPSIjMDAwMDAwIiBm%0Ab250LWZhbWlseT0i5a6L5L2TIiBmb250LXNpemU9IjE4cHgiIHRleHQtYW5j%0AaG9yPSJtaWRkbGUiPuWtpuS5oOeul+azlTwvdGV4dD48L3N3aXRjaD48L2c+%0APHBhdGggZD0iTSAyOTAgMjEwIEwgMzMzLjYzIDIxMCIgZmlsbD0ibm9uZSIg%0Ac3Ryb2tlPSIjMDAwMDAwIiBzdHJva2UtbWl0ZXJsaW1pdD0iMTAiIHBvaW50%0AZXItZXZlbnRzPSJzdHJva2UiLz48cGF0aCBkPSJNIDMzOC44OCAyMTAgTCAz%0AMzEuODggMjEzLjUgTCAzMzMuNjMgMjEwIEwgMzMxLjg4IDIwNi41IFoiIGZp%0AbGw9IiMwMDAwMDAiIHN0cm9rZT0iIzAwMDAwMCIgc3Ryb2tlLW1pdGVybGlt%0AaXQ9IjEwIiBwb2ludGVyLWV2ZW50cz0iYWxsIi8+PHJlY3QgeD0iMTcwIiB5%0APSIxODAiIHdpZHRoPSIxMjAiIGhlaWdodD0iNjAiIHJ4PSI5IiByeT0iOSIg%0AZmlsbD0iI2ZmZmZmZiIgc3Ryb2tlPSIjMDAwMDAwIiBwb2ludGVyLWV2ZW50%0Acz0iYWxsIi8+PGcgdHJhbnNmb3JtPSJ0cmFuc2xhdGUoLTAuNSAtMC41KSI+%0APHN3aXRjaD48Zm9yZWlnbk9iamVjdCBwb2ludGVyLWV2ZW50cz0ibm9uZSIg%0Ad2lkdGg9IjEwMCUiIGhlaWdodD0iMTAwJSIgcmVxdWlyZWRGZWF0dXJlcz0i%0AaHR0cDovL3d3dy53My5vcmcvVFIvU1ZHMTEvZmVhdHVyZSNFeHRlbnNpYmls%0AaXR5IiBzdHlsZT0ib3ZlcmZsb3c6IHZpc2libGU7IHRleHQtYWxpZ246IGxl%0AZnQ7Ij48ZGl2IHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hodG1s%0AIiBzdHlsZT0iZGlzcGxheTogZmxleDsgYWxpZ24taXRlbXM6IHVuc2FmZSBj%0AZW50ZXI7IGp1c3RpZnktY29udGVudDogdW5zYWZlIGNlbnRlcjsgd2lkdGg6%0AIDExOHB4OyBoZWlnaHQ6IDFweDsgcGFkZGluZy10b3A6IDIxMHB4OyBtYXJn%0AaW4tbGVmdDogMTcxcHg7Ij48ZGl2IHN0eWxlPSJib3gtc2l6aW5nOiBib3Jk%0AZXItYm94OyBmb250LXNpemU6IDBweDsgdGV4dC1hbGlnbjogY2VudGVyOyI+%0APGRpdiBzdHlsZT0iZGlzcGxheTogaW5saW5lLWJsb2NrOyBmb250LXNpemU6%0AIDE4cHg7IGZvbnQtZmFtaWx5OiDlrovkvZM7IGNvbG9yOiByZ2IoMCwgMCwg%0AMCk7IGxpbmUtaGVpZ2h0OiAxLjI7IHBvaW50ZXItZXZlbnRzOiBhbGw7IHdo%0AaXRlLXNwYWNlOiBub3JtYWw7IG92ZXJmbG93LXdyYXA6IG5vcm1hbDsiPuaY%0AoOWwhOWFs+ezu2g8L2Rpdj48L2Rpdj48L2Rpdj48L2ZvcmVpZ25PYmplY3Q+%0APHRleHQgeD0iMjMwIiB5PSIyMTUiIGZpbGw9IiMwMDAwMDAiIGZvbnQtZmFt%0AaWx5PSLlrovkvZMiIGZvbnQtc2l6ZT0iMThweCIgdGV4dC1hbmNob3I9Im1p%0AZGRsZSI+5pig5bCE5YWz57O7aDwvdGV4dD48L3N3aXRjaD48L2c+PHBhdGgg%0AZD0iTSAxMjAgMjEwIEwgMTYzLjYzIDIxMCIgZmlsbD0ibm9uZSIgc3Ryb2tl%0APSIjMDAwMDAwIiBzdHJva2UtbWl0ZXJsaW1pdD0iMTAiIHBvaW50ZXItZXZl%0AbnRzPSJzdHJva2UiLz48cGF0aCBkPSJNIDE2OC44OCAyMTAgTCAxNjEuODgg%0AMjEzLjUgTCAxNjMuNjMgMjEwIEwgMTYxLjg4IDIwNi41IFoiIGZpbGw9IiMw%0AMDAwMDAiIHN0cm9rZT0iIzAwMDAwMCIgc3Ryb2tlLW1pdGVybGltaXQ9IjEw%0AIiBwb2ludGVyLWV2ZW50cz0iYWxsIi8+PHJlY3QgeD0iMCIgeT0iMTgwIiB3%0AaWR0aD0iMTIwIiBoZWlnaHQ9IjYwIiByeD0iOSIgcnk9IjkiIGZpbGw9IiNm%0AZmZmZmYiIHN0cm9rZT0iIzAwMDAwMCIgcG9pbnRlci1ldmVudHM9ImFsbCIv%0APjxnIHRyYW5zZm9ybT0idHJhbnNsYXRlKC0wLjUgLTAuNSkiPjxzd2l0Y2g+%0APGZvcmVpZ25PYmplY3QgcG9pbnRlci1ldmVudHM9Im5vbmUiIHdpZHRoPSIx%0AMDAlIiBoZWlnaHQ9IjEwMCUiIHJlcXVpcmVkRmVhdHVyZXM9Imh0dHA6Ly93%0Ad3cudzMub3JnL1RSL1NWRzExL2ZlYXR1cmUjRXh0ZW5zaWJpbGl0eSIgc3R5%0AbGU9Im92ZXJmbG93OiB2aXNpYmxlOyB0ZXh0LWFsaWduOiBsZWZ0OyI+PGRp%0AdiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94aHRtbCIgc3R5bGU9%0AImRpc3BsYXk6IGZsZXg7IGFsaWduLWl0ZW1zOiB1bnNhZmUgY2VudGVyOyBq%0AdXN0aWZ5LWNvbnRlbnQ6IHVuc2FmZSBjZW50ZXI7IHdpZHRoOiAxMThweDsg%0AaGVpZ2h0OiAxcHg7IHBhZGRpbmctdG9wOiAyMTBweDsgbWFyZ2luLWxlZnQ6%0AIDFweDsiPjxkaXYgc3R5bGU9ImJveC1zaXppbmc6IGJvcmRlci1ib3g7IGZv%0AbnQtc2l6ZTogMHB4OyB0ZXh0LWFsaWduOiBjZW50ZXI7Ij48ZGl2IHN0eWxl%0APSJkaXNwbGF5OiBpbmxpbmUtYmxvY2s7IGZvbnQtc2l6ZTogMThweDsgZm9u%0AdC1mYW1pbHk6IOWui+S9kzsgY29sb3I6IHJnYigwLCAwLCAwKTsgbGluZS1o%0AZWlnaHQ6IDEuMjsgcG9pbnRlci1ldmVudHM6IGFsbDsgd2hpdGUtc3BhY2U6%0AIG5vcm1hbDsgb3ZlcmZsb3ctd3JhcDogbm9ybWFsOyI+6L6T5YWleDwvZGl2%0APjwvZGl2PjwvZGl2PjwvZm9yZWlnbk9iamVjdD48dGV4dCB4PSI2MCIgeT0i%0AMjE1IiBmaWxsPSIjMDAwMDAwIiBmb250LWZhbWlseT0i5a6L5L2TIiBmb250%0ALXNpemU9IjE4cHgiIHRleHQtYW5jaG9yPSJtaWRkbGUiPui+k+WFpXg8L3Rl%0AeHQ+PC9zd2l0Y2g+PC9nPjxyZWN0IHg9IjM0MCIgeT0iMTgwIiB3aWR0aD0i%0AMTIwIiBoZWlnaHQ9IjYwIiByeD0iOSIgcnk9IjkiIGZpbGw9IiNmZmZmZmYi%0AIHN0cm9rZT0iIzAwMDAwMCIgcG9pbnRlci1ldmVudHM9ImFsbCIvPjxnIHRy%0AYW5zZm9ybT0idHJhbnNsYXRlKC0wLjUgLTAuNSkiPjxzd2l0Y2g+PGZvcmVp%0AZ25PYmplY3QgcG9pbnRlci1ldmVudHM9Im5vbmUiIHdpZHRoPSIxMDAlIiBo%0AZWlnaHQ9IjEwMCUiIHJlcXVpcmVkRmVhdHVyZXM9Imh0dHA6Ly93d3cudzMu%0Ab3JnL1RSL1NWRzExL2ZlYXR1cmUjRXh0ZW5zaWJpbGl0eSIgc3R5bGU9Im92%0AZXJmbG93OiB2aXNpYmxlOyB0ZXh0LWFsaWduOiBsZWZ0OyI+PGRpdiB4bWxu%0Acz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94aHRtbCIgc3R5bGU9ImRpc3Bs%0AYXk6IGZsZXg7IGFsaWduLWl0ZW1zOiB1bnNhZmUgY2VudGVyOyBqdXN0aWZ5%0ALWNvbnRlbnQ6IHVuc2FmZSBjZW50ZXI7IHdpZHRoOiAxMThweDsgaGVpZ2h0%0AOiAxcHg7IHBhZGRpbmctdG9wOiAyMTBweDsgbWFyZ2luLWxlZnQ6IDM0MXB4%0AOyI+PGRpdiBzdHlsZT0iYm94LXNpemluZzogYm9yZGVyLWJveDsgZm9udC1z%0AaXplOiAwcHg7IHRleHQtYWxpZ246IGNlbnRlcjsiPjxkaXYgc3R5bGU9ImRp%0Ac3BsYXk6IGlubGluZS1ibG9jazsgZm9udC1zaXplOiAxOHB4OyBmb250LWZh%0AbWlseTog5a6L5L2TOyBjb2xvcjogcmdiKDAsIDAsIDApOyBsaW5lLWhlaWdo%0AdDogMS4yOyBwb2ludGVyLWV2ZW50czogYWxsOyB3aGl0ZS1zcGFjZTogbm9y%0AbWFsOyBvdmVyZmxvdy13cmFwOiBub3JtYWw7Ij7ovpPlh7p5PC9kaXY+PC9k%0AaXY+PC9kaXY+PC9mb3JlaWduT2JqZWN0Pjx0ZXh0IHg9IjQwMCIgeT0iMjE1%0AIiBmaWxsPSIjMDAwMDAwIiBmb250LWZhbWlseT0i5a6L5L2TIiBmb250LXNp%0AemU9IjE4cHgiIHRleHQtYW5jaG9yPSJtaWRkbGUiPui+k+WHunk8L3RleHQ+%0APC9zd2l0Y2g+PC9nPjwvZz48c3dpdGNoPjxnIHJlcXVpcmVkRmVhdHVyZXM9%0AImh0dHA6Ly93d3cudzMub3JnL1RSL1NWRzExL2ZlYXR1cmUjRXh0ZW5zaWJp%0AbGl0eSIvPjxhIHRyYW5zZm9ybT0idHJhbnNsYXRlKDAsLTUpIiB4bGluazpo%0AcmVmPSJodHRwczovL3d3dy5kaWFncmFtcy5uZXQvZG9jL2ZhcS9zdmctZXhw%0Ab3J0LXRleHQtcHJvYmxlbXMiIHRhcmdldD0iX2JsYW5rIj48dGV4dCB0ZXh0%0ALWFuY2hvcj0ibWlkZGxlIiBmb250LXNpemU9IjEwcHgiIHg9IjUwJSIgeT0i%0AMTAwJSI+Vmlld2VyIGRvZXMgbm90IHN1cHBvcnQgZnVsbCBTVkcgMS4xPC90%0AZXh0PjwvYT48L3N3aXRjaD48L3N2Zz4=)

  • 定义:线性回归问题是一个最小化问题。即为求θ0\theta_0θ1\theta_1,使得12mi=0m(hθ(xi)yi)2\frac{1}{2m}\sum_{i=0}^m(h_\theta(x^i)-y^i)^2最小。

    • 模型参数:θ0\theta_0θ1\theta_1
    • 假设函数:hθ(x)=θ0+θ1xh_\theta(x)=\theta_0+\theta_1x
    • 代价函数:J(θ0,θ1)=12mi=0m(hθ(xi)yi)2J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=0}^m(h_\theta(x^i)-y^i)^2
  • 扩展——多元线性回归

    • n表示特征的数量
    • 假设函数:hθ(x)=θ0x0+θ1x1+θ2x2++θnxnθ0=1h_\theta(x)=\theta_0x_0+\theta_1x_1+\theta_2x_2+\dots+\theta_nx_n,\theta_0=1
      • x=[x0x1xn]x=\begin{bmatrix}x_0\\x_1\\\vdots\\x_n\end{bmatrix}θ=[θ0θ1θn]\theta=\begin{bmatrix}\theta_0\\\theta_1\\\vdots\\\theta_n\end{bmatrix}hθ(x)=θTxh_\theta(x)=\theta^Tx
二元线性回归算法
  • 使用梯度下降算法求解线性回归问题即为线性回归算法
  • 线性回归的代价函数是凸函数,因此会得到全局最优解
  • 求解过程
    • J=0:θ0J(θ0,θ1)=1mi=1m(hθ(xi)yi)J=1:θ1J(θ0,θ1)=1mi=1m(hθ(xi)yi)xi\begin{array}{ll}J=0:\frac{\partial}{\partial\theta_0}J(\theta_0,\theta_1)&=\frac{1}{m}\sum_{i=1}^m(h_\theta(x^i)-y^i)\\\\J=1:\frac{\partial}{\partial\theta_1}J(\theta_0,\theta_1)&=\frac{1}{m}\sum_{i=1}^m(h_\theta(x^i)-y^i)\centerdot x^i\\\\\end{array}
    • θ0:=θ0α1mi=1m(hθ(xi)yi)θ1:=θ1α1mi=1m(hθ(xi)yi)xi\begin{array}{ll}\theta_0&:=\theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^i)-y^i)\\\\\theta_1&:=\theta_1-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^i)-y^i)\centerdot x^i\end{array}
多元线性回归算法
  • 使用梯度下降算法求解多元线性回归问题

  • 求解过程

    θj:=θjα1mi=1m(hθ(xi)yi)xji j=0,1,2,,n;ji\theta_j:=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^i)-y^i)\centerdot x_j^i,\space j=0,1,2,\dots,n;\\j表示特征编号,i表示训练集的实例编号

求解算法

梯度下降算法
  • 重复以下步骤直至收敛

    θj:=θjαθjJ(θ)\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta)for j=0,1,2,,nfor\space j=0,1,2,\dots,n

    :=:= :表示赋值操作

    α\alpha:为学习速率,控制每次更新的幅度。可理解为下山的步子大小

    运算的时候θ0 and θ1\theta_0\space and \space\theta_1要同时更新

  • 实际运算时的步骤

    temp0:=θ0αθ0J(θ)temp1:=θ1αθ1J(θ)tempn:=θnαθnJ(θ)\begin{matrix}temp0:=\theta_0-\alpha\frac{\partial}{\partial\theta_0}J(\theta)\\temp1:=\theta_1-\alpha\frac{\partial}{\partial\theta_1}J(\theta)\\\vdots\\tempn:=\theta_n-\alpha\frac{\partial}{\partial\theta_n}J(\theta)\end{matrix}

    θ0:=temp0θ1:=temp1θn:=tempn\begin{matrix}\theta_0:=temp0\\\theta_1:=temp1\\\vdots\\\theta_n:=tempn\end{matrix}

  • 单纯的梯度下降算法得到的是局部最优解,当代价函数为凸函数时,才一定会得到全局最优解。

  • 特征值缩放

    • 将每个特征值xix_i缩放到1<xi<1-1<x_i<1的范围,可以有效减少迭代次数,更快收敛。
    • -11并不重要,只要缩放后特征值大致在一个相近的范围即可。
    • 可以使用均值归一化方式来进行特征值缩放
      • xiμi2Si\frac{x_i-\mu_i}{2S_i}代替xix_i使得此特征平均值大概为0,且缩放到合适的范围。μi\mu_i为原来的平均值,SiS_i为极差
      • 例如:x1=x110002000x_1=\frac{x_1-1000}{2000}
  • 判断是否收敛

    • 一般绘制出代价函数值和迭代次数之间的关系图像。迭代次数为横坐标,相应的代价函数值为纵坐标。当图像近乎水平的时候就可视为收敛,此外此图像还可以指示算法是否正常运行。常用
    • 或者使用一些自动判断条件,如当两次迭代值之间的差距小于一个指定的极小值时就视为收敛。
  • 选取α\alpha

    • 当迭代曲线出现上升等异常情况时,可以考虑α\alpha值过大的问题
    • α\alpha值一般可以每隔10倍取一个进行测试0.001,0.01,0.1,1,...,最终选取迭代次数较少的一个。
  • 适用范围

    • n比较大时,梯度下降法仍然可以正常工作。
标准方程

m个训练集实例(x1,y1),,(xm,ym)(x^1,y^1),\dots,(x^m,y^m),每个实例有n个特征,其中每个特征可表示为一个n+1维的向量,xi=[x0ix1ix2ixni]x^i=\begin{bmatrix}x_0^i\\x_1^i\\x_2^i\\\vdots\\x_n^i\end{bmatrix}X=[(x1)T(x2)T(xm)T]X=\begin{bmatrix}(x^1)^T\\(x^2)^T\\\vdots\\(x^m)T\end{bmatrix}y=[y1y2ym]y=\begin{bmatrix}y^1\\y^2\\\vdots\\y^m\end{bmatrix},此时θ=(XTX)1XTy\theta=(X^TX)^{-1}X^Ty。此处得出的θ\theta即为最优值。

使用此方式不需要进行特征缩放和归一化。此方式适用于n比较小的情况n<10000

XTXX^TX不可逆时的处理

  • 两种情况
    • 存在多余的、线性相关的特征。删除多余属性
    • 属性过多,而数据太少(mnm\leq n)。此时可以考虑删除一些属性或者进行正则化