Gen AI with LLM integration: Part 2 – Integrating ChatGPT with ChistaDATA DBaaS: A New Era of Intuitive Database Management

In our previous blog, we explored the transformative role of generative AI in analytics and highlighted the benefits of using ClickHouse for high-performance data processing. Now, we are thrilled to take a significant leap forward by integrating ChistaDATA DBaaS with OpenAI’s ChatGPT. This integration promises to revolutionize the way we interact with databases, making it more intuitive, efficient, and accessible for both technical and non-technical users.

The Power of Natural Language Processing in Database Management

OpenAI’s ChatGPT, a state-of-the-art language model, excels at understanding and generating human-like text. By combining this capability with ChistaDATA DBaaS, we can create a seamless interface where users can interact with databases using natural language. This integration will simplify complex database tasks, enhance productivity, and democratize access to data insights.

Setting Up the Integration

To integrate ChatGPT with ChistaDATA DBaaS, follow these steps:

  1. API Access Setup: Obtain API keys from OpenAI. Ensure that your environment is configured to allow secure communication between ChatGPT and the database service.
    llm = OpenAIChatCompletion(identifier='llm', model='gpt-3.5-turbo', client_kwargs={'api_key': 'sk-api-key-from-sam-altman'},)
    
  2. Environment Configuration: Obtain and set up the necessary environment variables and configurations from ChistaDATA to enable API interactions. This involves specifying database endpoints, authentication credentials, and any necessary permissions.
    user = 'user_name'
    password = 'very_secret_password'
    port = 9440
    host = 'ws47-db.chistadata.io'
    database = 'default'
    
    table_name = "uk_price_paid"
  3. Integration Script: Developed script by ChistaDATA bridges the ChatGPT API with ChistaDATA DBaaS. This script will handle incoming natural language queries, translate them into SQL commands, execute them on the database, and return the results to the user.
    questions = [
        f"get me some records from {table_name} where value greater than 5000",
        f"calculate Average Price Per Year from {table_name}",
        f"calculate The Most Expensive Neighborhoods from {table_name}",
        f"calculate Distribution Analysis for Different Property Types from {table_name}",
        f"calculate Median Price and Total Transactions by Town and Year from {table_name}",
    ]

     

Running Database Commands Using ChatGPT

Once the integration is set up, you can begin executing database commands through natural language interactions. Here are some practical examples:

User Input 1:

get me some records from uk_price_paid where value greater than 5000

Output 1:

SELECT * FROM uk_price_paid WHERE price > 5000 LIMIT 5

    price        date  ...             district           county
0  145000  2008-11-19  ...          SCARBOROUGH  NORTH YORKSHIRE
1   70000  1995-08-04  ...               TORBAY           TORBAY
2   43000  1995-04-21  ...              CARADON         CORNWALL
3   60000  1995-01-27  ...  NORTH HERTFORDSHIRE    HERTFORDSHIRE
4   54000  1995-08-04  ...       ST EDMUNDSBURY          SUFFOLK

[5 rows x 14 columns]

After analyzing the provided CSV file, here are some business insights:
1. Price Distribution:
   - The prices of the properties range from 43,000 to 145,000 pounds.
   - The median price is roughly 60,000 pounds.
   - There seems to be a mix of property types and prices in the dataset.
2. Property Types:
   - The dataset contains semi-detached, terraced, and detached properties.
   - The most common property type is terraced.
3. Ownership Type:
   - All the properties listed in the dataset are either leasehold or freehold.
   - Freehold properties seem to be more common in this dataset.
4. Location Information:
   - The properties in the dataset are located in various towns and counties across the UK.
   - The properties are spread out geographically, indicating a diverse set of locations.
5. Date of Sale:
   - The sales of the properties in the dataset date back to the year 1995.
   - The most recent sale was in 2008, while the earliest sale was in 1995.

In summary, the dataset provides information on property sales in different locations, property types, ownership types, and sale prices. This information can be further analyzed to gain insights into the real estate market trends in the UK over the years.

User Input 2:

calculate Average Price Per Year from uk_price_paid

Output 2:

SELECT year(date) AS year, AVG(price) AS average_price
FROM uk_price_paid
GROUP BY year
LIMIT 5

   year  average_price
0  1995   67940.694315
1  1996   71516.810430
2  1997   78546.472036
3  1998   85445.402813
4  1999   96046.770455

Based on the provided CSV data, the average price of products has shown an increasing trend over the years. In 1995, the average price was $67,940.69, and by 1999, it had increased to $96,046.77. This indicates a consistent growth in pricing over the 5-year period.
The average price saw a gradual increase from 1995 to 1999, with the highest growth rate recorded between 1998 and 1999. This could be attributed to factors such as inflation, changes in market demand, or improvements in product quality.

Overall, the data suggests that the business has been able to increase the average price of its products over time, which can positively impact revenue and profitability. Monitoring pricing trends and identifying factors driving pricing changes will be crucial for making informed business decisions to sustain growth in the future.
The Most Expensive Neighborhoods from uk_price_paid

User Input 3:

calculate The Most Expensive Neighborhoods from uk_price_paid

Output 3:

SELECT locality, AVG(price) as avg_price
FROM uk_price_paid
GROUP BY locality
ORDER BY avg_price DESC
LIMIT 5

               locality   avg_price
0  BOURNE BUSINESS PARK  76729889.0
1        PANATTONI PARK  68962067.0
2  PANSHANGER AERODROME  55000000.0
3      EAGLE FARM NORTH  50766731.5
4             ADAMSTOWN  41421319.0

Based on the provided CSV content, there are several business insights that can be derived. The localities listed along with their average prices are as follows:
1. BOURNE BUSINESS PARK has an average price of 76,729,889.0
2. PANATTONI PARK has an average price of 68,962,067.0
3. PANSHANGER AERODROME has an average price of 55,000,000.0
4. EAGLE FARM NORTH has an average price of 50,766,731.5
5. ADAMSTOWN has an average price of 41,421,319.0

These insights can be used by businesses in the real estate industry to make informed decisions about property investments and market trends. It is important to note the significant differences in average prices between the localities, which could indicate varying levels of demand and desirability.
In summary, analyzing the average prices of properties in different localities can provide valuable insights for businesses looking to understand market trends and make strategic decisions in the real estate industry.

Input 4:

bring me Price Distribution Analysis for Different Property Types from uk_price_paid

Output 4:

SELECT type, COUNT(*) as num_properties, AVG(price) as avg_price, MIN(price) as min_price, MAX(price) as max_price
FROM uk_price_paid
GROUP BY type
LIMIT 5
            type  num_properties      avg_price  min_price  max_price
0          other             244  659095.401639        100   92555918
1       terraced         4308241   98605.093928          1   21500000
2  semi-detached         3844808  111612.176315         95    9500000
3       detached         3161311  188609.245817          1   32477000
4           flat         2315233  126253.546681        100   24750000
Based on the provided CSV content, we can see that the price distribution analysis for different property types in the UK shows some interesting insights:

1. The most common property type in the dataset is terraced, with a total of 4,308,241 properties. The average price for a terraced property is £98,605.09, with prices ranging from £1 to £21,500,000.

2. The second most common property type is semi-detached, with 3,844,808 properties. The average price for a semi-detached property is £111,612.18, with prices ranging from £95 to £9,500,000.

3. Detached properties come in third, with 3,161,311 properties. The average price for a detached property is £188,609.25, with prices ranging from £1 to £32,477,000.

4. Flats are the least common property type in the dataset, with 2,315,233 properties. The average price for a flat is £126,253.55, with prices ranging from £100 to £24,750,000.

In summary, the data shows that detached properties have the highest average price, followed by semi-detached, terraced, and flats. However, terraced properties are the most common type in the dataset. This information can be useful for real estate professionals, investors, and anyone interested in the UK property market.

Input 5:

calculate Median Price and Total Transactions by Town and Year from uk_price_paid

Output 5:

SELECT town, toYear(date) AS year, median(price) AS median_price, count(*) AS total_transactions
FROM uk_price_paid
GROUP BY town, year
LIMIT 5
           town  year  median_price  total_transactions
0      KILGETTY  2005      170750.0                  36
1  ROMNEY MARSH  2002      120500.0                 331
2       CORSHAM  1999       88750.0                 326
3    WASHINGTON  1996       42900.0                 686
4        WOKING  1999      125000.0                2805
Based on the provided data, we can analyze the median price and total transactions by town and year. 

Some business insights from the data include:
1. In 2005, Kilgetty had a median price of 170,750.0 with a total of 36 transactions.
2. In 2002, Romney Marsh had a median price of 120,500.0 with a total of 331 transactions.
3. In 1999, Corsham had a median price of 88,750.0 with a total of 326 transactions.
4. In 1996, Washington had a median price of 42,900.0 with a total of 686 transactions.
5. In 1999, Woking had a median price of 125,000.0 with a total of 2,805 transactions.

Summary:
The data shows a variation in median prices and total transactions across different towns and years. This information can be valuable for real estate analysis, market trends, and investment decisions in the property sector. Additionally, businesses operating in these towns can use this data to understand the housing market dynamics and plan their strategies accordingly.


 

Practical Use Cases of the Integration

  1. Automated Reporting: Streamline the generation of complex reports by simply asking ChatGPT to retrieve specific data insights. This is particularly useful for managers and stakeholders who need quick access to data without writing SQL queries.
  2. Enhanced Data Analysis: Enable data scientists and analysts to perform intricate data analysis tasks through conversational commands, speeding up the workflow and reducing the need for manual query writing.
  3. Routine Task Management: Simplify routine database tasks such as data entry, updates, and maintenance by using natural language commands. This reduces the learning curve for new users and increases operational efficiency.
┌─year─┬───price─┬─bar(round(avg(price)), 0, 2000000, 100)───────────────┐
│ 1995 │  109110 │ █████▍                                                │
│ 1996 │  118659 │ █████▊                                                │
│ 1997 │  136526 │ ██████▋                                               │
│ 1998 │  153002 │ ███████▋                                              │
│ 1999 │  180633 │ █████████                                             │
│ 2000 │  215849 │ ██████████▋                                           │
│ 2001 │  232987 │ ███████████▋                                          │
│ 2002 │  263668 │ █████████████▏                                        │
│ 2003 │  278424 │ █████████████▊                                        │
│ 2004 │  304664 │ ███████████████▏                                      │
│ 2005 │  322887 │ ████████████████▏                                     │
│ 2006 │  356195 │ █████████████████▋                                    │
│ 2007 │  404062 │ ████████████████████▏                                 │
│ 2008 │  420741 │ █████████████████████                                 │
│ 2009 │  427754 │ █████████████████████▍                                │
│ 2010 │  480322 │ ████████████████████████                              │
│ 2011 │  496278 │ ████████████████████████▋                             │
│ 2012 │  519482 │ █████████████████████████▊                            │
│ 2013 │  616195 │ ██████████████████████████████▋                       │
│ 2014 │  724121 │ ████████████████████████████████████▏                 │
│ 2015 │  792101 │ ███████████████████████████████████████▌              │
│ 2016 │  843589 │ ██████████████████████████████████████████▏           │
│ 2017 │  983523 │ █████████████████████████████████████████████████▏    │
│ 2018 │ 1016753 │ ██████████████████████████████████████████████████▋   │
│ 2019 │ 1041673 │ ████████████████████████████████████████████████████  │
│ 2020 │ 1060027 │ █████████████████████████████████████████████████████ │
│ 2021 │  958249 │ ███████████████████████████████████████████████▊      │
│ 2022 │  902596 │ █████████████████████████████████████████████▏        │
└──────┴─────────┴───────────────────────────────────────────────────────┘

Conclusion

The integration of ChatGPT with ChistaDATA DBaaS marks a significant advancement in database management. By leveraging the natural language processing capabilities of ChatGPT, we can make database interactions more intuitive and accessible. This powerful combination will not only enhance productivity but also democratize access to data, enabling users across various domains to harness the full potential of their data.

Stay tuned for more updates as we continue to explore and expand the capabilities of this integration. The future of database interaction is here, and it’s conversational, intuitive, and incredibly powerful.

References:

Blog Part 1: https://chistadata.com/generative-ai-in-analytics-part-1-unlocking-new-possibilities-with-clickhouse/

Open AI: https://openai.com/api/ 

ClickHouse Example Dataset: https://clickhouse.com/docs/en/getting-started/example-datasets

About Can Sayn 41 Articles
Can Sayın is experienced Database Administrator in open source relational and NoSql databases, working in complicated infrastructures. Over 5 years industry experience, he gain managing database systems. He is working at ChistaDATA Inc. His areas of interest are generally on open source systems.
Contact: Website