Computer Science ChatGPT and other Large Language Models

uçuyorum

Contributor
Messages
992
Reactions
13 1,608
Nation of residence
Turkey
Nation of origin
Turkey
One concept for deployment of AI and LLM models: The large language models are as the name suggest, large. What researchers have realized with transformer architecture is that, as the models grow in size, they start to exhibit different type of human like reasoning that smaller models simply weren't able to. But as these models grow they also require lots of data and lots of computation power. Now LLM work well by training on a general data, the more they know about the world, the better they work at most tasks, much more human like than the AI we had 10 years ago. But when domain specific knowledge is limited, particularly restricted domains like military, fine tuning a model is also difficult.

Also, LLM are not good at maths and numerical data, they perform well with text data and manual labels, they require extra tools for chatgpt to be able to use math( model can write the equation and solve it with external calculator). And so an AI used for various such purposes must also be trained to be able to use those external tools ( via special language tokens). This requires military data to be transformed into a format useable by LLM to leverage them.

Now getting to the crux of the issue. Being able to use these models in real time is a challenge of network communication. You can't have an instance of LLM running on every vehicle, nor would it be as effective. However having a single centralized instance makes bandwidth and security / ew resistance an issue. So the concept should be, basically command and control kind of vehicles, being accompanied by trailers that have servers running instances of LLM and communications equipment, to be able to communicate with assets in the area, and local communication will be much faster as well. This still requires some of the data to be preprocessed in place, however. So any sensor equipped system should be able process and maybe tokenize its data by itself to be shared to these models and compress them. This requires thinking with lessons from Big Data and IoT fields, that is, thinking the system as a whole, having asymmetrical distributed architecture, and trying to minimize data transfer overheads and keep simplicity of data processing commands. Also large assets like ships should have datacenters designed around future use of such advanced AI.
 

Saithan

Experienced member
Denmark Correspondent
Messages
8,988
Reactions
39 20,372
Nation of residence
Denmark
Nation of origin
Turkey

DeepSeek shocks AI world with 'cheap' Chinese chatbot​


Chinese artificial intelligence (AI) app DeepSeek has overtaken ChatGPT and other rivals to become the top-rated free application on Apple's App Store in the US, UK and China.

The app has surged in popularity since its launch in January, challenging the widely-held belief that America is the untouchable leader of the AI industry.

It is powered by the open-source DeepSeek-V3 model, which its researchers claim was developed for less than $6m - significantly less than the billions spent by rivals.

But this claim has been disputed by others in the AI space.

After DeepSeek-R1 was launched earlier this month the company boasted of "performance on par with" one of ChatGPT maker OpenAI's latest models - when used for tasks such as maths, coding and natural language reasoning.

Silicon Valley venture capitalist and Donald Trump advisor Marc Andreessen described DeepSeek-R1 as "AI's Sputnik moment", in a reference to the first artificial Earth satellite that was launched by the Soviet Union in 1957.

Advanced chips power the training of AI models like ChatGPT and DeepSeek.

But since 2021 the US government has widened its restrictions on advanced chips being sold to China.

In order to continue their work without steady supplies of imported advanced chips, Chinese AI developers have shared their work with each other and experimented with new approaches to the technology.

This has resulted in AI models that require far less computing power than before. It also means that they cost a lot less than previously thought possible, which has the potential to upend the industry.

Shares in AI-related companies based in the US, such as Nvidia, Microsoft and Meta were down on Monday morning - and the development knocked European share prices.

ASML, the Dutch chip equipment maker, saw its share price tumble by more than 10% while shares in Siemens Energy, which makes hardware related to AI, plunged by 21%.

"This idea of a low cost Chinese version hasn't necessarily been forefront, so it's taken the market a little bit by surprise," said Fiona Cincotta, senior market analyst at City Index.

"So if you suddenly get this low-cost AI model, then that's going to raise concerns over the profits of rivals, particularly given the amount that they've already invested in more expensive AI infrastructure."

And Singapore-based technology equity advisor Vey-Sern Ling told the BBC it could "potentially derail the investment case for the entire AI supply chain".

But Wall Street banking giant Citi cautioned that while DeepSeek could challenge the dominant positions of American companies like OpenAI, issues faced by Chinese firms could hamper their development.

"We estimate that in an inevitably more restrictive environment, US' access to more advanced chips is an advantage," its analysts said in a report.

Last week, a consortium of US tech firms and foreign investors announced The Stargate Project, a company which is putting $500bn into AI infrastructure in Texas.

Who founded DeepSeek?​

The company was founded in 2023 by Liang Wenfeng in Hangzhou, a city in southeastern China.

The 40-year-old, an information and electronic engineering graduate, also founded the hedge fund that backed DeepSeek.

He reportedly built up a store of Nvida A100 chips, now banned from export to China. Experts believe this collection - which some estimates put at 50,000 - led him to launch DeepSeek, by pairing these chips with cheaper, lower-end ones that are still available to import.

Mr Liang was recently seen at a meeting between industry experts and the Chinese premier Li Qiang.

In a July 2024 interview with The China Academy, Mr Liang said he was surprised by the reaction to the previous version of his AI model.
"We didn't expect pricing to be such a sensitive issue," he said.

"We were simply following our own pace, calculating costs, and setting prices accordingly."


 

Saithan

Experienced member
Denmark Correspondent
Messages
8,988
Reactions
39 20,372
Nation of residence
Denmark
Nation of origin
Turkey
The west is likely going to do massive trash talking about the Deepseek AI, but I think it's important that tech companies in Türkiye also try to make one and release it. If we deliver an AI like Deepseek and it means we join the AI bandwagon the tech companies will be hit brutally. I think it's important. Just like the tech war in 90's. That lead to better and faster processors
 

dBSPL

Experienced member
Think Tank Analyst
DefenceHub Ambassador
Messages
2,707
Reactions
103 13,819
Nation of residence
Turkey
Nation of origin
Turkey
While US companies are at each other's throats and China is pushing the sector with state support, the EU region is currently a bystander. In our country, unfortunately, despite the fact that it will affect all industrial and service areas, it is of interest to a very limited circle. The T3 foundation is trying to create synergy for a local language model, but there is no serious capital working on the subject except for a few defense/aerospace based companies.
 

Saithan

Experienced member
Denmark Correspondent
Messages
8,988
Reactions
39 20,372
Nation of residence
Denmark
Nation of origin
Turkey
While US companies are at each other's throats and China is pushing the sector with state support, the EU region is currently a bystander. In our country, unfortunately, despite the fact that it will affect all industrial and service areas, it is of interest to a very limited circle. The T3 foundation is trying to create synergy for a local language model, but there is no serious capital working on the subject except for a few defense/aerospace based companies.
It is because every minister is from the stone age. The most they can accomplish is use a MS program and their phone. My cousin were among the top 3-4 tech engineers in the late 90's that was responsible for TC internet infrastructure.

We seriously need to put some money into cyber security, but also work on developing an AI on national level.
 

Fuzuli NL

Experienced member
Germany Correspondent
Messages
3,164
Reactions
29 8,931
Nation of residence
Germany
Nation of origin
Turkey
China just dropped another bomb called Janus Pro 7B.
It's an image based AI. Supposedly the most powerful.
 

Follow us on social media

Top Bottom