Prject introduction:
This is a codeing poject. Mind you that I am not a real programmer, I just know some stuff and learn everything else I can on the go. This is the reason why this project will probably be really slow. The mission of this project, is to make a bot that can find the same markets on polymarket and kalshi, and look for arbitrage opportunities. If the bot finds an actual misprice, and the shares are valued differently it should make bats in both directions. The bot should calculate the perfect stakes, so no matter the outcome it wil realise profit.
Updates:
I abandoned this project for a little while, and by the time i started to work on in again, i had some changes in the field. Wich is the fact that both Kalshi and Polymarket got banned in my country. Becouse of this unfortunate incident, i have to use a VPN to access thees sites. This isnt a big problem while tradeing by hand or gathering information for the bot, but an arbitrage bot like this, have to work as fast as possible, and the use of a VPN adds a lot of latency to the process. My solution is a server that runs a VM that i can use. There is a company called Oracle Cloud Infrastructure, who offer a free package, that you can use to run a VM on their servers. Of course thees VM-s have limited resourses, but my bot wont use too mutch computeing power anyway. This VM that i am going to use has one OCPU and 1GB of RAM, plus my outgoeing data can max out at 10TB which is way more than needed. An other advantage is, that i could choose the location in wich my VM is going to run. So I chose one on the East coast of the USA, becouse i suspect that is the closest to the Kalshi and Polymarket servers, and this should further reduce my latency, wich is critical.
With this problem out of the way, i could spend time on my codeing problem. I could make the market matcheing work, but not the way i tried before, becoue i couldn't tune it to work reliably and accurate enough. So the new version only checks the titles of the markets, and only makes tokens from those. This way i have way less tokens, but it is more managable. I ditched the calculateing system too wich calculated the number tokens that were in both markets, and checked what percentage is that of all the tokens. The new system searches for frases, that contain several words, and of course team names. These frases and the team names worth points, and if a pair of markets get 3 points, thay are concidered a match. This workes surpriseingly well. The only problem i had, was when a set tokens that made up a frase were a subset of an other frase. But i solved this by only useing the longest matching frases. This way the only thing i have to do is scroll threw thees pre set matches, and write a number "1" when a match is a real match. After that an other program will check for arbitrage possibility.
I allready have a workeing program, that checks for arbitrage, but it isnt accurate enough. The biggest problem with my current arbitrage checkeing algorithm, is that it isn't calculateing with the fees. That is becouse Kalshi has a weird fee policy. So both Kalshi and Polymarket makes you pay a fee only on your winnings, so if you made 100$ by betting 20$ you only pay fees on the 80$ win, Wich is 2% at Polymarket. But at Kalshi it depends on the odds of the particular bet. And i was simply too lasy to read the whole thing and figour out the system. But it's comeing soon. My plan is to get the machine running, for the Super Bowl.
Until than I have some thigs to do. I have to figour out how to actually place bets, and to keep track of mow mutch monney i have left. Plus i need some kind of stake size calculation. I hope I will be able to do it, it is a good thing that i don't have to do anything else right now, so it's only a matter or willpower.
There is a lot of information, that the program will need to actually scan the market for an arbitrage situation, so first i tried to figoure out what thees are, and get them. The most important thing the program needs is the matcheing markets betwenn kalshi and polymarket, and their id. I could look for thees by hand, but there are a lot of thees and it would take a lot of time, so i try to make a little algoritm that would sorth the probable matches out, so I only have to check the most probable matches.
For this i will use the market descriptions in both sides, and i will tokenise them. Than look for matcheing tokens. To get thees descriptions thankfully I can use API-s. Thees are easy to use, and both kalshi and polymarket has guides about how to use them. I have to make a request for thees API-s and thay send a json file in response. Unfortunatelly i havent seen or used json files before, but thay are pretty easy to understand, and python has built in functions to interact with them, so it didn't cause any problem. For tokeniseing the descriptions, the program makes every word a token, and makes every latter lower case. It is important to filter out words that has no context about the market, like "a" , " the " , "will" etc. Than the program iterates threw every kalshi market, and tryes to match every polymarket to them. Becouse the kalski markets with all their information is stored in a json file, the possible matches are stored in the json. The matcheing is made, by devideing the nuber of tokens that both markets have, with the number o fall tokens in both markets. If this similarity number is big enough it is concidered a possible match.
This methode sounds great, but it has some jitters. The main problem is, that thees sites dont use the same vocabulary, and thay say the same things with different words all the time. This will cause a small similarity number even when the two markets are actually the same. To solve this, i made and generated with AI some lists of words that mean the same. There are a lot of possibilities here, so i choos to isolate the NFL related markets, and used this tehcnics on thees, so i don't have to list every word in the english dictionary. Thi isolation of the NFL related markets went surpriseingly smoothly. Kalsh uses a string called "ticker" to id the markets, and thees cotain the short name of the leage if the market is sprots related, so I just scanned for NFL and it worked great. For polymarket, I had to scann for key words, like "football" and sutch and even put in stop words liek "basketball", but after that it worked great, but with all of this, I still hade trouble findeing matches reliably. My next move was to give important words extra weight, like the team names. This way if the important words match, it gives extra points to the pair. This helped a lot, but i still need to tune the amount ofextra points given if the important words match, and tune the amount of points, from where the pair is considered a match. If this works reliably, i will have to do this for all the other leags, so I am a little sceptical about it, and I might just search for the matches by hand, wich seems a little low tech to be honest.