爬虫afreectv出来时候

看有人在发afreectv,是时候把爬虫放出来了!

网络资讯 2023-01-19 21:51:55 43

导读

爬取内容是主播主页的vod视频,结果是一个txt:https://ls小东西lb.lanzoub.com/iFMg40k68t9i试播:https://vod-archive-global-cdn-z02.afreecatv.com/v101/hls/se/afreeca/station/2021/0524/01/1621786468833836.失眠il/original/both/playli……

今天日常点卯, 被一位佬哥的标题吸引了注意力。

开始以为是短视频平台上很火的变装秀,打开却看到了熟悉的直播平台。

于是想起来自己去年的一个不成熟的脚本,特发来供大家使用。

爬取内容是主播主页的vod视频,结果是一个txt:https://ls小东西lb.lanzoub.com/iFMg40k68t9i

试播:https://vod-archive-global-cdn-z02.afreecatv.com/v101/hls/se/afreeca/station/2021/0524/01/1621786468833836.失眠il/original/both/playlist.m3u8

抛砖引玉,大佬们给完善完善

import requests,re,time,random

findre=re.compile('失眠il:(.*?)/playlist.m3u8')

hd={'cookie':'','User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36','accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/if,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9','accept-encoding':'gzip, deflate, br','accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7,zh-HK;q=0.6','sec-ch-ua':'"Not?A_Brand";v="8", "Chromium";v="108", "Google Chrome";v="108"','sec-ch-ua-platform': '"Windows"','sec-fetch-site': 'same-site'}

def getpageinfo(bjname):

url='https://bjapi.afreecatv.com/api/{0}/vods/all?page=1&per_page=60'.format(bjname)

indexpage=requests.get(url,headers=hd)

totalpage=indexpage.json()['meta']['last_page']

print(totalpage)

return totalpage

def getallvodinfo(bjname,sid,bbsid,tid):

try:

resapi='https://stbbs.afreecatv.com/api/video/get_video_info.php?szBjId={0}&nStationNo={1}&nBbsNo={2}&nTitleNo={3}'.format(bjname,sid,bbsid,tid)

infos=requests.get(resapi,headers=hd)

playurl=findre.findall(infos.text)[0]

print(playurl)

realurl='https://vod-archive-global-cdn-z02.afreecatv.com/v101/hls/'+playurl+'/original/both/playlist.m3u8'

return realurl

except:

print(str(tid)+'err!')

return str(tid)+'err!'

def getpagecnt(bjname,page):

baseurl='https://bjapi.afreecatv.com/api/'+bjname+'/vods/all?per_page=60&page='

pagecnt=requests.get(baseurl+str(i),headers=hd).json()

with open(bjname+'.txt','a',encoding='utf-8') as wt:

for p in pagecnt['data']:

u=getallvodinfo(bjname, p['station_no'],p['bbs_no'], p['title_no'])

wt.write(u+' ')

time.sleep(random.randint(10, 40))

if __name__=='__main__':

bjname=input('bjname:')

startpage=input('起始页码:') or '1'

totalpage=getpageinfo(bjname)

for i in range(int(startpage),totalpage+1):

print('当前是第{0}页'.format(str(i)))

getpagecnt(bjname, str(i))

time.sleep(random.randint(30, 120))复制代码小白一枚,大佬勿喷!