使用 Nodejs 内置模块 + shell 脚本实现简单的图片爬虫

用到的模块 https 用于请求child_process 用于执行 shell 脚本

引入内置模块 http / https

1	`const https = require('https');`

引入 spawn 模块

1	`const { spawn } = require('child_process');`

定义要获取图片的 url 地址

1	`const url = 'https://www.duitang.com/article/?id=870171';`

定义正则匹配

1	`const reg = /http(s)?:\/\/([\S]*)\.(jpg\|jpeg\|png\|gif\|webp)/g;`

发起请求

// 发起请求
https.get(url, (res) => {
  // 定义空字符串
  let str = '';

  // 监听加载
  res.on('data', (data) => {
    str += data;
  });

  // 监听结束
  res.on('end', () => {
    let i;

    // 每次匹配一个
    while ((i = reg.exec(str))) {
      // 取每条数据
      let item = i[0];

      // 执行shell脚本 传递参数 参数为 img的 地址链接
      spawn('sh', ['index.sh', item]);
    }
    console.log('执行');
  });
});

shell 脚本编写

#!/bin/sh

# 定义命名 时间戳 + 随机数
time=$(date "+%Y%m%d%H%M%S")-$RANDOM

# 取图片格式
type="${1##*.}"

# 如果不存在此目录 则创建
if [ ! -d "img" ]; then
    mkdir img
fi

# 发起请求 输出到指定目录指定文件名
curl $1 -o ./img/$time.$type

# 退出脚本
exit 0;

执行 index.js 脚本

1	`node index.js`

执行脚本一定要在当前目录下。

如果 url 是 https 开头那么发起请求就需要用 https 否则 http 就可以。

WANG KE

Node

使用 Nodejs 内置模块 + shell 脚本实现简单的图片爬虫

引入内置模块 http / https

引入 spawn 模块

定义要获取图片的 url 地址

定义正则匹配

发起请求

shell 脚本编写

执行 index.js 脚本

正常执行后，会在当前目录下，新建 img 文件夹，里面是下载好的图片。

Node

使用 Nodejs 内置模块 + shell 脚本 实现简单的图片爬虫

引入内置模块 http / https

引入 spawn 模块

定义要获取图片的 url 地址

定义正则匹配

发起请求

shell 脚本编写

执行 index.js 脚本

正常执行后，会在当前目录下，新建 img 文件夹，里面是下载好的图片。

使用 Nodejs 内置模块 + shell 脚本实现简单的图片爬虫