How to build your own (topic specific) search engine- Web-Crawler, Meta Search Engine
Aus der Kategorie: Knowledge Base
Q: "I need to explore multiple search engines for information and URLs on relevant medical equipment such as chemistry analyzers and surgical tables."
A: "This topic is more complex then it looks like, andManuels recommendationneeds some clarification."
So, please let meshare my thoughtsaboutthis topic.
The horrible simple fact is: I cannot suggest a "compact solution"/package but like to do some advertisment (sorry). So for further development on this, simply I need money ;-)
But so far here is some deeper guidance, if you follow it I believe it is easy to resolve your problem.
First, what you actually are you looking for is a"Meta Search Engine"or a"Subject-Specific Web Crawler"(for yourdamned pharmacyhowever).
If you just use the google API as suggested by Manuel you will run into restrictions and privacy issues, meaning Google first will track all activity of your client (this propably will happen also, if you use examples I suggest below and stay legal, but nevermind so far), second Google willpersonalize your clientwhat will affect yourSERPin an unwanted way.
And finally and further, a more complexWeb Crawlerwill enable you first to include sources from more than one provider in yourserps, second tofake user agents, user agentsessions,IPsandlocal settings,proxysandavoid to get personalized, and third you may want toprocessthe fetched search results further andspiderthe SERPs webpages delivered by the search engines result pages,parseandanalizethem on your ownmeta-search-engines topic specific criterias,follow linksand
finally build your own pharmadatabasebased onyourweb.
Also remember there are lots of sources like topic specificRSS-Feedsand so on you will like tospiderand/or moderate/applythem to your SEmanually!
If you used and combined a number of search engines, found websites and ways to access them, by meta-search engine, fake client or by just using thecommon APIs, you want toprocess all that data,analize textandmeta-tagsand provideprepared viewsandsearch resultsto your End-User.
So you can see, the solution you will choose will depend on your concrete demands and can take a time from a weekend up to years.
I did some experience with my own meta search engine. Sure, it is out of date, has bad optical GUI design and is very slow. But in fact it does all what it is expected to do, and updates will come in any later version when there is the time for it.
REQUESTS(e.g. when taking a search):
- A "normal" request to google search, like /?q=searchterm
- A request to each of the found links in the result page of the previous request to google (and processing it links later...)
- Cached requests to the officialBing API, on text results, images, videos,...
- Periodical automated requests to a few selected breaking news portals
- Periodical automated requests to a few selected RSS-Feeeds
- Query the web-crawlers result DB
- Query a lot of internal DBs and tables, e.g.domain specificand user generated data stocks
PREPROCESSof the results:
- Split the results into topics/modules
- Performing SQL FULLTEXT search on the results
- Parsing and analyzing text and html/metatags of the found pages
- configurable ranking of the results based on the criterias
- Calculate the "SEO-Performance" of a few selected sites
- Compute the most important buzzwords of the breaking-news headlines of the moment
- Extract links for later crawling
- Searchformular Autocomplete
- OpenSearchDescription.xml (to register search engine in browser)
- No "faking clients" or "special tricks", when requesting sites the user agent and the IP is indicating my crawler as my metacrawler, everything is legal and fair!
Although your request onbuilding a search for "medical, chemistry, surgical products"sounds a bit scaring to me, nevermind, currently I am qualified andpurchasable.
Not knowing your final goal, but guessing your intention must be something commercial, and knowing the market about pharmacy and selling medicals online is jaded morbidly, and whatever you plan to do, sure there will be a number of people who done this or a similar work before, the pharma industries will drive a large and well-proven assortment of products related to your issue, rarely them will be free of charge.
To get started building your ownWeb-Crawler, I recommend the following package to you:
Using theHttp Client Classby Manuel Lemosyou will be able to develop a Search Engines Crawler Client or any other kind ofbotorproxyinPHPwith ease.
You will find many other helpful classes related to query websites, SEOandAPI-using(e.g. Google APIs) onphpclasses.org.
Erstellt vonWEBFAN(Monday 1st of August 2016 03:29:24 PM - vor995.38Tagen)
in der KategorieKnowledge Baseals statische Seite
Veröffentlich/Freigeschaltet: Monday 1st of August 2016 04:56:17 PM vonWEBFAN
Zuletzt geändert: Monday 1st of August 2016 04:56:17 PM vonWEBFAN
Der Beitrag wurde insgesamt1977mal gelesen (durchschnittlich1.99mal am Tag)
Jetzt kostenlos als Benutzer von "frdl" registrieren...!
Kommentar zu diesem Beitrag verfassen:
Bewertung des Beitrages:- Noch keine Bewertung -von 10 Punkten(bei 0 Stimmen)
Kommentare zu diesem Beitrag:
- keine Kommentare zu diesem Beitrag vorhanden -