node.js突破nginx防盗链机制，下载图片案例分析原创

问题

今天项目需求要求采集几个网站的信息，包括一些区块链统计图表之类的信息。

笔者使用的是node.js+axios库发送get请求来获取在图片，下载到本地。测试代码如下：

import fs from 'fs';
import path from 'path';
import http from 'http';
import https from 'https';

const __dirname = path.resolve);
let filePath = path.join__dirname,'/imgtmp/');
async function downloadfileurl,filename,callback){
    try {
        let ext = path.extnameurl);
        
        console.log'下载的文件名：',filename)
        let mod = null;//http、https 别名
        ifurl.indexOf'https://')!==-1){
            mod = https;
        }else{
            mod = http;
        }
        const req = mod.geturl, {
            headers:{
                "Content-Type": "application/x-www-form-urlencoded"
              }
        },res)=>{
            let writePath = '';
            writePath = filePath + '/' + filename;
            const file = fs.createWriteStreamwritePath)
            res.pipe file)
            file.on "error", error) => {
                console.log `There was an error writing the file. Details: `,error)
                return false;
            })
            file.on "close", ) => {
                callback filename)
            })

            file.on 'finish', ) => {
                file.close )
                console.log "Completely downloaded.")
            })
        })

        req.on "error", error) => {
            console.log `Error downloading file. Details: $ {error}`)
        })
    } catch error) {
        console.log'图片下载失败！',error);
    }
    
}

let url = 'https://xx.xxxx.com/d/file/zxgg/a2cffb8166f07c0232eca49f8c9cc242.jpg';//图片url
let filename = path.basenameurl);
await downloadfileurl,filename,)=>{
    console.logfilename,"文件已下载成功");
})

运行代码，图示文件下载成功！

然而当笔者打开图片一看，就傻眼了~图片显示损坏，再看大小，只有304字节~

目测应该是图片保存了一些错误信息，于是用editplus以文本形式打开该图片，果然看到了错误信息~

解决方法

百度了一下，确定是图片nginx服务器Referer防盗链设置，于是继续百度，找到了问题的关键~

谷歌浏览器打开网址，在控制台上看到了这段Referer信息：

对方的网站在Referer设置的就是他的网址，于是改进代码，在headers中加入Referer参数"referer":'https://www.xxxx.com/'：

import fs from 'fs';
import path from 'path';
import http from 'http';
import https from 'https';

const __dirname = path.resolve);
let filePath = path.join__dirname,'/imgtmp/');
async function downloadfileurl,filename,callback){
    try {
        let ext = path.extnameurl);
        
        console.log'下载的文件名：',filename)
        let mod = null;//http、https 别名
        ifurl.indexOf'https://')!==-1){
            mod = https;
        }else{
            mod = http;
        }
        const req = mod.geturl, {
            headers:{
                "Content-Type": "application/x-www-form-urlencoded",
                "referer":'https://www.xxxx.com/'
              }
        },res)=>{
            let writePath = '';
            writePath = filePath + '/' + filename;
            const file = fs.createWriteStreamwritePath)
            res.pipe file)
            file.on "error", error) => {
                console.log `There was an error writing the file. Details: `,error)
                return false;
            })
            file.on "close", ) => {
                callback filename)
            })

            file.on 'finish', ) => {
                file.close )
                console.log "Completely downloaded.")
            })
        })

        req.on "error", error) => {
            console.log `Error downloading file. Details: $ {error}`)
        })
    } catch error) {
        console.log'图片下载失败！',error);
    }
    
}

let url = 'https://xx.xxxx.com/d/file/zxgg/a2cffb8166f07c0232eca49f8c9cc242.jpg';//图片url
let filename = path.basenameurl);
await downloadfileurl,filename,)=>{
    console.logfilename,"文件已下载成功");
})

再次运行代码，图片文件下载成功，打开显示一切正常！

后记

笔者又测试了另一种实现方法，即使用playwright调用浏览器打开页面，再使用await page.locator'selector路径').screenshot{ path: 'image图片保存路径'}); 将图片网页截图保存下载。

对比了一番，发现使用playwright截图的方法需要在遍历图片元素的时候根据当前元素逆向获取parentNode节点以及遍历childNodes节点，算法相对比较复杂！而且screenshot函数截图的效果也会比原图略显模糊，因此推荐使用axios传递Referer参数的方法获取原图。

PS：方法二的调试过程中写了一段逆向遍历selector的函数，提供给大家参考，如有不足之处，欢迎指正~

/**
 * 获取selector
*/
function getSelectorPathelement) {
    if !!element.id !== false) {
      return '#' + element.id;
    }
    if element === document.body && !!element) {
      return element.tagName.toLowerCase);
    }
  
    let ix = 0;
    const siblings = element.parentNode?.childNodes;
    for let i = 0; i < siblings?.length; i++) {
      const sibling = siblings[i];
      if sibling.innerHTML === element.innerHTML && !!element.parentNode) {
        return `${getSelectorPathelement.parentNode)} > ${element.tagName.toLowerCase)}:nth-child${ix + 1})`;
      }
      if sibling.nodeType === 1) {
        ix++;
      }
    }
}

您可能感兴趣的文章:

nginx利用referer指令实现防盗链配置
Nginx防盗链的配置方法
Nginx服务器下防盗链的方法介绍
Nginx图片防盗链配置实例
配置Nginx的防盗链的操作方法
nginx配置防盗链的三种实现方式总结
node+axios实现下载外网文件到本地
nodejs连接ftp上传下载实现方法详解【附：踩坑记录】
nodejs实现生成文件并在前端下载
Node.js实现下载文件的两种实用方式
Node.js实现批量下载图片简单操作示例

node.js突破nginx防盗链机制，下载图片案例分析原创

问题

解决方法

后记

Published by

风君子

发表回复取消回复

近期文章

标签

问题

解决方法

后记

Published by

风君子

发表回复 取消回复

近期文章

标签

发表回复取消回复