仓酷云

标题: PHP网页编程之IIS日记剖析搜刮引擎爬虫纪录法式 [打印本页]

作者: 透明 时间: 2015-2-3 23:53
标题: PHP网页编程之IIS日记剖析搜刮引擎爬虫纪录法式
告诉你了一个方式，但是缺少努力这一环节，那也是白搭。利用注重：
　　修正iis.php文件中iis日记的相对途径
　　例如：$folder=”c:/windows/system32/logfiles/站点日记目次/”; //前面记得必定要带斜杠(/)。
　　( 用虚拟空间的不懂检查你的站点相对途径?上传个探针检查!
　　直接检查法：http://站点域名/iis.php
　　当地检查法：把日记下载到当地 http://127.0.0.1/iis.php )
　　注重：
　　//站点日记目次，注重该目次必需要有站点用户读取权限!
　　//假如把日记下载到当地请修正143行的网址为您网站的网址，此操作不是需要操作，不影响剖析了局。
　　//修正文件称号iis.php 需求同时修正对应代码 ctrl+h 把 iis.php全体交换成您要修正的文件名不然法式运转失足。
　　//假如iis日记文件过大，能够会招致法式超时!同时也不建议人人利用!

 以下是PHP源代码：
<?php
/*
 牛仔IIS日记蜘蛛匍匐纪录剖析器 V1.1(PHP GB2312 版)
 作者：牛仔
 QQ：172379201
 Email:17gd@163.com
*/
//===================================================
 header("content-type:text/html; charset=gb2312");
//站点日记目次，注重该目次必需要有站点用户读取权限！
//假如把日记下载到当地请修正143行的网址为您网站的网址，此操作不是需要操作，不影响剖析了局。
//假如修正了文件称号iis.php 需求同时修正代码 Ctrl+H 把 iis.php全体交换成您要修正的文件名不然法式运转失足。
$folder="D:/Vhost/WebRoot/jooker82465/www/wordpress/uploads/W3SVC87164023/"; //前面记得必定要带斜杠 / ！
$pagesize = 50;//设置分页显示条数！
//=========================
$type = addslashes($_GET[’type’]);
if ($type)$type = base64_decode($type);
$showfile = addslashes($_GET[’showfile’]);
$page = addslashes($_GET[’page’]);
if (!$page)$page=1;
//============================
//翻开目次
if (!$type){
if (file_exists($folder))
{
 $fp=opendir($folder);
 while(false!=$file=readdir($fp))
 {
 if($file!=’.’ &&$file!=’..’)
 {
 $file="$file";
 $arr_file[]=$file;
 }
 }
 if(is_array($arr_file))
 {
for ($i=count($arr_file)-1;$i>=0;$i--)
{
$indexstr.="
<tr><td height=\"25\" width=\"10%\">".date("Y-m-d",filectime($folder.$arr_file[$i]))."</td>
<td height=\"25\" width=\"10%\" align=\"center\">
<a href=\"iis.php?type=".base64_encode(Baiduspider)."&showfile=".$arr_file[$i]."\">百度(Baidu)</a></td>
<td height=\"25\" width=\"10%\" align=\"center\">
<a href=\"iis.php?type=".base64_encode(Googlebot)."&showfile=".$arr_file[$i]."\">谷歌(Google)</a></td>
<td height=\"25\" width=\"10%\" align=\"center\">
<a href=\"iis.php?type=".base64_encode(yahoo)."&showfile=".$arr_file[$i]."\">雅虎(yahoo)</a></td>
<td height=\"25\" width=\"10%\" align=\"center\">
<a href=\"iis.php?type=".base64_encode(YodaoBot)."&showfile=".$arr_file[$i]."\">有道(yodao)</a></td>
<td height=\"25\" width=\"10%\" align=\"center\">
<a href=\"iis.php?type=".base64_encode(Sosospider)."&showfile=".$arr_file[$i]."\">搜搜(soso)</a></td>
<td height=\"25\" width=\"10%\" align=\"center\">
<a href=\"iis.php?type=".base64_encode(Sogou)."&showfile=".$arr_file[$i]."\">搜狗(sogou)</a></td>
<td height=\"25\" width=\"10%\" align=\"center\">
<a href=\"iis.php?type=".base64_encode(msnbot)."&showfile=".$arr_file[$i]."\">微软(msn)</a></td>
</tr>";
}
}
closedir($fp);
$html = indexhtml();
$copy = mycopy();
$html = str_replace("[showlog]",$indexstr,$html);
$html = str_replace("[copy]",$copy,$html);
echo $html;
}else{
 echo "该日记目次不存在或权限缺乏，请反省设置！";
 exit();
}
}elseif ($type==’Baiduspider’){
 echo show($type,$folder,$showfile,$page,$pagesize);
}elseif ($type==’Googlebot’){
 echo show($type,$folder,$showfile,$page,$pagesize);
}elseif ($type==’yahoo’){
 echo show($type,$folder,$showfile,$page,$pagesize);
}elseif ($type==’YodaoBot’){
 echo show($type,$folder,$showfile,$page,$pagesize);
}elseif ($type==’Sosospider’){
 echo show($type,$folder,$showfile,$page,$pagesize);
}elseif ($type==’Sogou’){
 echo show($type,$folder,$showfile,$page,$pagesize);
}elseif ($type==’msnbot’){
 echo show($type,$folder,$showfile,$page,$pagesize);
}

function show($type,$folder,$showfile,$page,$pagesize)
{
if ($type==’Baiduspider’)
{
 $title=’百度’;
}elseif ($type==’Googlebot’){
 $title=’谷歌’;
}elseif ($type==’yahoo’){
 $title=’雅虎’;
}elseif ($type==’YodaoBot’){
 $title=’有道’;
}elseif ($type==’Sosospider’){
 $title=’搜搜’;
}elseif ($type==’Sogou’){
 $title=’搜狗’;
}elseif ($type==’msnbot’){
 $title=’MSN’;
}
if ($type&&$folder&&$showfile)
{
 if(file_exists($folder.$showfile))
 {
 $fp= fopen($folder.$showfile,"r");
 }else{
echo "该日记文件不存在，请反省设置！";
exit;
 }
 $j=0;
 $y=0;
 $t=0;
 $h=0;
 while (!feof($fp))
 {
$str = fgets($fp);
$str =iconv("UTF-8","GB2312//IGNORE",$str);
if(strpos($str,$type))
{
$j++;
$temp[].=$str;
$tmpcount = explode(" ",$str);
if ($tmpcount[11]==200)$t++;
if ($tmpcount[11]==304)$h++;
if ($tmpcount[11]==404)$y++;
}
 }
 fclose($fp);
 $count = count($temp);
 if ($page==1)
 {
$countshow=$count;
$mynum = $count-$pagesize;
 }else{
$countshow =$count-($page*$pagesize-$pagesize);
$mynum = $count-$page*$pagesize;
 }
 $pagecount =ceil(count($temp) / $pagesize);
 if ($page>=$pagecount)
 {
$mynum = $pagecount;
 }
 $m=0;
 for ($i=$countshow-1;$i>=$mynum;$i--)
 {
$num = explode(" ",$temp[$i]);
 $domain="http://tarr.cn"; //网站URL 末尾不要带斜杠
$show.="
<tr onMouseOut=\"this.style.backgroundColor=’#FFFFFF’\" onMouseOver=\"this.style.backgroundColor=’#F6F6F6’\">
<td class=\"c\" width=\"200;\">".$num[0]." ".$num[1]."</td>
<td class=\"c\">".$num[9]."</td>
<td class=\"pl\"><a href=\"$domain$num[5]\" _fcksavedurl="\"$domain$num[5]\"" target=\"_blank\">".$num[5]."</a></td>
<td class=\"c\">".$num[11]."</td>
</tr>";
 }
 unset($temp);
 $showpage = "<td colspan=\"4\" height=\"30\" align=\"center\">每页 ".$pagesize." 条以后".$page."/$pagecount";
 $showpage.=" <a href=\"?type=".base64_encode($type)."&showfile=".$showfile."\">首页</a>";
 if ($page!=1)
 {
$showpage.=" <a href=\"?type=".base64_encode($type)."&showfile=".$showfile."&page=".($page-1)."\">上一页</a>";
 }
 if ($page!=$pagecount)
 {
 $showpage.=" <a href=\"?type=".base64_encode($type)."&showfile=".$showfile."&page=".($page+1)."\">下一页</a>";
 $weei = " <a href=\"?type=".base64_encode($type)."&showfile=".$showfile."&page=".($pagecount)."\">尾页</a>";
 }
 $showpage.=$weei."</td>";
 if ($show)
 {
 $html = pagehtml();
 $copy = mycopy();
 $htmltitle = "牛仔IIS日记蜘蛛匍匐纪录剖析器茄咧啡修正版";//请保存，感谢！
 $html = str_replace("[title]",$title,$html);
 $html = str_replace("[htmltitle]",$htmltitle,$html);
 $html = str_replace("[show]",$show,$html);
 $html = str_replace("[count]",$j,$html);
 $html = str_replace("",$showpage,$html);
 $html = str_replace("[y]",$y,$html);
 $html = str_replace("[t]",$t,$html);
 $html = str_replace("[h]",$h,$html);
 $html = str_replace("[copy]",$copy,$html);
 return $html;
 }
}
}
function indexhtml()
{
return ’<html>
<head>
<meta http-equiv="Content-Language" content="zh-cn">
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
<title>牛仔IIS日记蜘蛛匍匐纪录剖析器 V1.1</title>
<style>

</style>
</head>
<body>
<table border="1" width="100%" id="table1" cellspacing="0" cellpadding="0" style="border-collapse: collapse">
<tr>
 <td colspan="8" bgcolor="#808080" height="30" align="center">
 牛仔IIS日记蜘蛛匍匐纪录剖析器茄咧啡修正版</td>
</tr>
<tr>
 <td height="25" align="center" width="260">日期</td>
 <td colspan="6" height="25" align="center">引擎</td>
</tr>
<tr>
 [showlog]
</tr>
</table>
[copy]
</body>
</html>’;
}
function pagehtml()//============显示模板，标签取代显示内容！
{
return ’<html>
<head>
<meta http-equiv="Content-Language" content="zh-cn">
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
<title>[title]蜘蛛匍匐剖析 - [htmltitle]</title>
<style>

</style>
</head>
<body>
<table border="1" width="100%" id="table1" cellspacing="0" cellpadding="0" style="border-collapse: collapse" height="74">
<tr>
<td><a href="iis.php">前往日记目次</a> | <a href="http://www.dj965.com">dj965</a>
 <td colspan="3" bgcolor="#808080" height="30" align="center">
 [title]蜘蛛匍匐剖析</td>
</tr>
 <tr>
 <td colspan="4" height="20" align="center">今天志[title]蜘蛛共匍匐 [count] 次，个中正常 [t] 个，逝世链 [y] 个，缓存 [h] 个</td>
</tr>
<tr>
 <td align="center" width="200px;">工夫</td>
 <td align="center" width="150px;">蜘蛛IP</td>
 <td align="center">被爬URL</td>
 <td align="center" width="100px;">匍匐了局</td>
</tr>
[show]
<tr>

</tr>
</table>
[copy]
</body>
</html>’;
}
function mycopy()
{
return ’<table border="1" width="100%" id="table2" cellspacing="0" cellpadding="0" style="border-collapse: collapse" height="402">
<tr>
 <td height="35" bgcolor="#C0C0C0" align="center">注备申明</td>
</tr>
<tr>
 <td height="170">
 正常：暗示该面页蜘蛛会见正常，并已下载。匍匐形态前往200。
 逝世链：暗示蜘蛛会见的面页不存在或链接毛病，匍匐形态前往404。
 缓存：暗示蜘蛛之前已爬过的面页且该面页未更新过，蜘蛛缓存区已存在该文件，不再下载该面页内容。匍匐形态前往304。
 注重：蜘蛛爬过的面页纷歧定会放出来，由于蜘蛛爬归去的数据须经由引擎划定规矩挑选后才会放出来，至于具体请检查引擎收录匡助。
 </td>
</tr>
<tr>
 <td>
 法式称号：<a target="_blank" href="http://tarr.cn/?p=23">牛仔IIS日记蜘蛛匍匐纪录剖析器 - 茄咧啡修正版</a> 修正者：<a href="http://www.tarr.cn/" target="_blank">茄咧啡</a>
 *******************************************************
 原法式称号：<a target="_blank" href="http://www.niuzi.com/">牛仔IIS日记蜘蛛匍匐纪录剖析器</a>
 原作者：牛仔
 QQ：172379201
 Email:17gd$163.com ($转换@)
 注重：本法式只供人人进修利用，请勿用作贸易用处。
</tr>
</table>’;
}
?>
然后大吼：别人可以，我为什么就不可以?(是不是有点阎罗教练的味道，默默的确是电影看多了，抽嘴巴是会痛的，各位其实明白这个道理了就行了)

作者: 海妖 时间: 2015-2-4 06:56
先学习php和mysql,还有css(html语言很简单)我认为现在的效果比以前的方法好。

作者: 活着的死人 时间: 2015-2-9 18:10
对于懒惰的朋友，我推荐php的集成环境xampp或者是wamp。这两个软件安装方便，使用简单。但是我还是强烈建议自己动手搭建开发环境。

作者: 精灵巫婆 时间: 2015-2-16 09:01
我学习了一段时间后，我发现效果并不好（估计是我自身的问题）。因为一个人的精力总是有限的，同时学习这么多，会导致每个的学习时间都得不到保证。

作者: admin 时间: 2015-3-5 02:56
做为1门年轻的语言，php一直很努力。

作者: 谁可相欹 时间: 2015-3-11 22:44
兴趣是最好的老师，百度是最好的词典。

作者: 仓酷云 时间: 2015-3-19 15:09
不禁又想起那些说php是草根语言的人，为什么认得差距这么大呢。

作者: 小魔女 时间: 2015-3-22 01:10
基础有没有对学习php没有太大区别，关键是兴趣。

作者: 变相怪杰 时间: 2015-3-25 09:52
当留言板完成的时候，下步可以把做1个单人的blog程序，做为目标，

作者: 若天明 时间: 2015-3-27 22:11
爱上php，他也会爱上你。

作者: 灵魂腐蚀 时间: 2015-4-7 15:08
当留言板完成的时候，下步可以把做1个单人的blog程序，做为目标，

作者: 再见西城 时间: 2015-4-15 13:37
学习php的目的往往是为了开发动态网站，phper就业的要求也涵盖了很多。我大致总结为：精通php和mysql

作者: 深爱那片海 时间: 2015-4-16 21:11
建数据库表的时候，int型要输入长度的，其实是个摆设的输入几位都没影响的，只要大于4就行，囧。

作者: 若相依 时间: 2015-4-26 15:12
php里的数组为空的时候是不能拿来遍历的；（这个有点低级啊，不过我刚被这个边界问题墨迹了好长一会）

作者: 小女巫 时间: 2015-6-15 21:16
如果你可以写完像留言板这样的程序，那么你可以去一些别人的代码了，

作者: 简单生活 时间: 2015-6-16 09:10
写js我最烦的就是 ie 和 firefox下同样的代码结果显示的结果千差万别，还是就是最好不要用遨游去调试，因为有时候遨游是禁用js的，有可能代码是争取结果被遨游折腾的认为是代码写错。

作者: 金色的骷髅 时间: 2015-6-20 21:24
当然这种网站的会员费就几十块钱。

作者: 只想知道 时间: 2015-7-2 18:25
最后祝愿，php会给你带来快乐的同时你也会给他带来快乐。

作者: 老尸 时间: 2015-7-12 20:15
多看优秀程序员编写的代码，仔细理解他们解决问题的方法，对自身有很大的帮助。

作者: 飘灵儿 时间: 2015-7-27 00:33
本文当是我的笔记啦，遇到的问题随时填充

欢迎光临仓酷云 (http://www.ckuyun.com/)