[Linux-aus] Open Source AI or LLM people + projects in Australia/NZ
Kathy Reid
kathy at kathyreid.id.au
Sun Jan 7 16:00:39 AEDT 2024
Hi Lyndsey,
The comments here are from my personal perspective, not those of any
academic or institutional affiliations I have.
Firstly I want to clarify some terms. AI and machine learning is a very
broad space - it's used in almost every vertical - defence, aerospace,
marketing, business analytics, education and so on. A large language
model (LLM) is a specific type of machine learning model that is
_predictive_ and _generative_. Given a prompt, it responds with an
output that best matches the prompts _based on the data it has been
trained on_. Other types of machine learning models (for example, image
generators like DALLE-2) use different algorithms and are trained on
different types of data. If you train an LLM on a particular set of
data, for example, "movies", then it will predict what it's been trained on.
## Distinguishing open source models from open source data
Models are built on data.
While a model can be open sourced, that isn't enough to make it "open
source AI", IMHO - and the Open Source Initiative is having a broader
conversation around what is meant by "open source AI" [5]. The
algorithm, code, dependencies etc are required for reproducibility to
make it "truly open", IMHO.
The key thing to keep in mind is that models are trained on *data* -
it's not so important where the model is developed, but it's really
important where the data comes from (this is the focus of the movement
called "data-centric AI" [0]).
## LLMs in the Australian context
The challenge here for the Australian context is that many LLMs are
either not trained on Australian-specific data, _or_, their predictions
do not mirror the Australian context because their training set has
little Australian content compared to American content. This can be
specified using prompts - for example "Please create a news headline and
two paragraphs of copy of an important event in Melbourne in 2002. Use
Australian English." - feeding this prompt to ChatGPT (3) will use
Australian spelling and Melbourne-specific landmarks. Chat GPT also
seems to have some grounding in Indigenous knowledges, such as Bunjil.
But if I ask ChatGPT the question "There's a bingle at Broady and the
Western's chokkas back to the servo. What should I do?" then it quickly
degrades to general advice, rather than context-specific suggestions
("Mate, can you turn off to Donnybrook Road or Plenty Road?").
AFAIK there are no open source, public LLMs being developed within
Australia.
The main academic conference for LLM work in Australia (which is
considered less prestigious than international conferences such as
NeurIPS [1] and ICML [2]) is the Australasian Language Technology
Conference (ALTA), which I attended in December. My notes from this are
public [3]. The main focus of this conference was the application of LLM
technology to healthcare applications - such as mining medical records
to assist health professionals in making accurate and timely diagnoses.
These language models *are* being made open source, but they are smaller
and much more specific than ChatGPT.
Indeed, there was a lot of conversation at the conference about the
*need* for a research or open source LLM in Australia because the costs
of using ChatGPT and others (Claude, Bard) quickly become expensive.
In summary, there is a lot of scope for creating an Australian-specific,
open source LLM. AFAIK, one doesn't exit.
## Other open source LLM efforts
The main open source LLM efforts are:
* https://www.eleuther.ai/
* Llama 2 by Meta
* https://falconllm.tii.ae/
* Mistral AI
None of these efforts are Australian-based.
## Other Australian open AI efforts
Many universities are tackling the AI-generated / LLM generated text
issue by using AI-detection tools, primarily within the Turnitin suite.
Turnitin LLC is headquartered in California, USA.
From conversations I've had with archivists in Australian collection
institutions, there is also a need for Australian-specific speech
recognition tools - because tools like Whisper do not recognise
Australian accented speech as well as other accents [4]. These tools are
being used to transcribe audio visual archives. Again, a lot of this
comes down to the *data* that the model is trained on. The key problem
here is that Whisper was trained on 680k hours of speech data. To train
a comparable model, you would need hundreds of thousands of hours of
Australian-accented speech. The AusTalk archive, for example, from
memory has maybe a couple of thousand hours (it's offline so I can't
check).
In the last week, we've also seen rapid advances in voice synthesis -
with MyShell releasing OpenVoice voice cloning technology [6]. The
*model* is openly available, but again, the data and algorithms and
code, are not. The challenge here for Australia is - do we want to be
able to clone Australian-accented voices (yeah, nah ;-). There are no
Australian TTS / voice cloning efforts that are open source, AFAIK. This
raises major ethical questions for the likes of the ATO that uses voice
recognition (and which has been previously spoofed [7]).
AI is rapidly being adopted into cybersecurity efforts, particularly in
the field of adversarial AI. These capabilities are predominantly the
domain of the Acronym Agencies (ASIO, DSD etc), and the folks at BSides
might be useful to talk to about Australian efforts here.
In terms of Australian AI institutes, there are a few:
* The Data61 CSIRO National Artificial Intelligence Centre - which
doesn't actually produce any AI or ML, its remit is to encourage AI
adoption -
https://www.csiro.au/en/work-with-us/industries/technology/National-AI-Centre
* Australian Institute for Machine Learning at University of Adelaide -
Research in to AI / ML - https://www.adelaide.edu.au/aiml/
* A2I2 Institute at Deakin University - https://a2i2.deakin.edu.au/
* UNSW AI Institute - https://www.unsw.edu.au/unsw-ai
Invariably, the academic institutions offer various forms of (mostly
postgrad) offerings, with a heavy emphasis on "engaging industry" (read:
getting industry to fund AI research because the government's research
funding is paltry).
## The problem of national capability and why the business adoption
centres are exacerbating rather than addressing this, IMHO
(I was about to write "sovereign capability" but as I was reminded,
correctly, recently, sovereignty was never ceded).
This all brings me to the key problem I have with the business adoption
initiative. What I've outlined above is that Australia has very little
national capability in AI, and even less in open source AI. What we
adopt is, predominantly, American-owned AI that might then be
shoe-horned into an Australian context. Sure, businesses should be
looking to adopt AI to remain competitive. But the only AI they can
adopt at the moment is, largely, American AI.
What the business adoption initiative seeks to do is spur *adoption*
rather than *development*. What I would like to see happen is the
development of national AI capability, preferably in the form of open
source products that can be used by Australian businesses and
organisations nationally. Perhaps one of our national organisations
should focus on that, rather than encouraging Australian businesses to
spend money overseas ...
Kind regards,
Kathy Reid
[0] https://dcai.csail.mit.edu/
[1] https://neurips.cc/
[2] https://icml.cc/
[3] https://blog.kathyreid.id.au/2023/12/10/alta2023/
[4] My PhD research, forthcoming
[5] https://blog.opensource.org/open-source-ai-establishing-a-common-ground/
[6] https://research.myshell.ai/open-voice
[7]https://www.theguardian.com/technology/2023/mar/16/voice-system-used-to-verify-identity-by-centrelink-can-be-fooled-by-ai
On 7/1/24 14:54, Lyndsey Jackson via linux-aus wrote:
> Hi all,
>
> on a bit of a fact finding reach out for people or connections from
> people working on open AI/LLM projects.
>
> Late last year a proposal for AI Centres to help SME's adopt AI
> dropped.
> https://business.gov.au/grants-and-programs/artificial-intelligence-ai-adopt-program Before
> the holiday break I did some work on a proposal concept for
> agricultural value add (which is very, very broad), and I have insight
> into how some key groups were considering approaching it.
>
> And if you have any tech/advice/ideas/groups please let me know, I
> might not get a group to put a bid in but that's ok. I still want to
> know what's happening in open source and who is working on it.
>
>
> Thanks,
>
> Lyndsey
>
>
> _______________________________________________
> linux-aus mailing list
> linux-aus at lists.linux.org.au
> http://lists.linux.org.au/mailman/listinfo/linux-aus
>
> To unsubscribe from this list, send a blank email to
> linux-aus-unsubscribe at lists.linux.org.au
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linux.org.au/pipermail/linux-aus/attachments/20240107/5c0c2386/attachment-0001.html>
More information about the linux-aus
mailing list